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(57) Abstract 



A method for selecting recombinant host cells expressing high levels of a desired protein is described. This method utilizes eukaryotic 
host cells harboring a DNA construct comprising a selectable gene (preferably an amplifiable gene) and a product gene provided 3' to 
the selectable gene. The selectable gene is positioned within an intron defined by a splice donor site and a splice acceptor site and the 
selectable gene and product gene are under the transcriptional control of a single transcriptional regulatory region. The splice donor site 
is generally an efficient splice donor site and thereby regulates expression of the product gene using the transcriptional regulatory region. 
The transfected cells are cultured so as to express the gene encoding the product in a selective medium comprising an amplifying agent 
for sufficient time to allow amplification to occur, whereupon either the desired product is recovered or cells having multiple copies of the 
product gene are identified. 
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METHOD FOR SELECTING HIGH- EXPRESSING HOST CELLS 
BACKGROUND OF THE INVENTION 

Field of the Invention 

This invention relates to a method of selecting for high- expressing 
5 host cells, a method of producing a protein of interest in high yields and 
a method of producing eukaryotic cells having multiple copies of a sequence 
encoding a protein of interest. 

Description of Background and Relat ed Art 

The discovery of methods for introducing DNA into living host cells 

10 in a functional form has provided the key to understanding many fundamental 
biological processes, and has made possible the production of important 
proteins and other molecules in commercially useful quantities . 

Despite the general success of such gene transfer methods, several 
common problems exist that may limit the efficiency with which a gene 

15 encoding a desired protein can be introduced into and expressed in a host 
cell. One problem is knowing when the gene has been successfully 
transferred into recipient cells . A second problem is distinguishing 
between those cells that contain the gene and those that have survived the 
transfer procedures but do not contain the gene. A third problem is 

20 identifying and isolating those cells that contain the gene and that are 
expressing high levels of the protein encoded by the gene. 

In general, the known methods for introducing genes into eukaryotic 
cells tend to be highly inefficient. Of the cells in a given culture, only 
a small proportion take up and express exogenous ly added DNA, and an even 

2 5 smaller proportion stably maintain that DNA. 

Identification of those cells that have incorporated a product gene 
encoding a desired protein typically is achieved by introducing into the 
same cells another gene, commonly referred to as a selectable gene, that 
encodes a selectable marker. A selectable marker is a protein that is 

3 0 necessary for the growth or survival of a host cell under the particular 

culture conditions chosen, such as an enzyme that confers resistance to an 
antibiotic or other drug, or an enzyme that compensates for a metabolic or 
catabolic defect in the host cell. For example, selectable genes commonly 
used with eukaryotic cells include the genes for aminoglycoside 

3 5 phosphotransferase (APH) , hygromycin phosphotransferase (hyg) , 
dihydrofolate reductase (DHFR) , thymidine kinase (tk) , neomycin, puromycin, 
glutamine synthetase, and asparagine synthetase. 

The method of identifying a host cell that has incorporated one gene 
on the basis of expression by the host cell of a second incorporated gene 

40 encoding a selectable marker is referred to as cotransf ectation (or 
cotransf ection) . In that method, a gene encoding a desired polypeptide and 
a selection gene typically are introduced into the host cell 
simultaneously, although they may be introduced sequentially. In the case 
of simultaneous cotransf ectation, the gene encoding the desired polypeptide 
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and the selectable gene may be present on a single DNA molecule or on 

separate DNA molecules prior to being introduced into the host cells. 

Wigler ec aJ . , Cell , 16:777 (1979). Cells that have incorporated the gene 

encoding the desired polypeptide then are identified or isolated by 
5 culturing the cells under conditions that preferentially allow for the 

growth or survival of those cells that synthesize the selectable marker 

encoded by the selectable gene. 

The level of expression of a gene introduced into a eukaryotic host 

cell depends on multiple factors, including gene copy number, efficiency 
10 of transcription, messenger RNA (mRNA) processing, stability, and 

translation efficiency. Accordingly, high level expression of a desired 

polypeptide typically will involve optimizing one or more of those factors . 

For example, the level of protein production may be increased by 

covalently joining the coding sequence of the gene to a "strong" promoter 
15 or enhancer that will give high levels of transcription. Promoters and 

enhancers are nucleotide sequences that interact specifically with proteins 

in a host cell that are involved in transcription. Kriegler, Meth. 

Enzymol . , 185:512 (1990); Maniatis et al . , Science, 236 : 1237 (1987 ) . 

Promoters are located upstream of the coding sequence of a gene and 

2 0 facilitate transcription of the gene by RNA polymerase. Among the 

eukaryotic promoters that have been identified as strong promoters for 
high-level expression are the SV40 early promoter, adenovirus major late 
promoter, mouse metallothionein- I promoter, Rous sarcoma virus long 
terminal repeat, and human cytomegalovirus immediate early promoter (CMV) . 

25 Enhancers stimulate transcription from a linked promoter. Unlike 

promoters, enhancers are active when placed downstream from the 
transcription initiation site or at considerable distances from the 
promoter, although in practice enhancers may overlap physically and 
functionally with promoters. For example, all of the strong promoters 

30 listed above also contain strong enhancers. Bendig, Genetic Engineering . 
7:91 (Academic Press, 1988) . 

The level of protein production also may be increased by increasing 
the gene copy number in the host cell . One method for obtaining high gene 
copy number is to directly introduce into the host cell multiple copies of 

3 5 the gene, for example, by using a large molar excess of the product gene 

relative to the selectable gene during cotransf ectat ion. Kaufman, Meth. 
Enzymol . , 18S:537 (1990). With this method, however, only a small 
proportion of the cotransf ected cells will contain the product gene at high 
copy number. Furthermore, because no generally applicable, convenient 
40 method exists for distinguishing such cells from the majority of cells that 
contain fewer copies of the product gene, laborious and time-consuming 
screening methods typically are required to identify the desired high- copy 
number transf ectants . 

Another method for obtaining high gene copy number involves cloning 
the gene in a vector that is capable of replicating autonomously in the 
host cell . Examples of such vectors include mammalian expression vectors 
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derived from Epstein-Barr virus or bovine papilloma virus, and yeast 2- 
micron plasmid vectors. Stephens & Hentschel, Biochem. J. . 248:1 (1987); 
Yates et al . , Nature , 313:812 (1985); Beggs , Genetic Engineering , 2:175 
(Academic Press, 1981) . 
5 Yet another method for obtaining high gene copy number involves gene 

amplification in the host cell. Gene amplification occurs naturally in 
eukaryotic cells at a relatively low frequency. Schimke, J. Biol . Chem. , 
263:5989 (1988). However, gene amplification also may be induced, or at 
least selected for, by exposing host cells to appropriate selective 

10 pressure. For example, in many cases it is possible to introduce a product 
gene together with an amplif iable gene into a host cell and subsequently 
select for amplification of the marker gene by exposing the cotransf ected 
cells to sequentially increasing concentrations of a selective agent. 
Typically the product gene will be coamplif ied with the marker gene under 

15 such conditions. 

The most widely used amplif iable gene for that purpose is a DHFR 
gene, which encodes a dihydrof olate reductase enzyme. The selection agent 
used in conjunction with a DHFR gene is methotrexate (Mtx) . A host cell 
is cotransf ected with a product gene encoding a desired protein and a DHFR 

20 gene, and transf ectants are identified by first culturing the cells in 
culture medium that contains Mtx. A suitable host cell when a wild-type 
DHFR gene is used is the Chinese Hamster Ovary (CHO) cell line deficient 
in DHFR activity, prepared and propagated as described by Urlaub & Chasin, 
Proc. Nat. Acad. Sci . USA . 77:4216 (1980). The transfected cells then are 

25 exposed to successively higher amounts of Mtx. This leads to the synthesis 
of multiple copies of the DHFR gene, and concomitantly, multiple copies of 
the product gene. Schimke, J. Biol . Chem. . 263:5989 (1988); Axel et al . , 
U.S. Patent No. 4,399,216; Axel et al., U.S. Patent No. 4,634,665. Other 
references directed to co-transf ection of a gene together with a genetic 

30 marker that allows for selection and subsequent amplification include 
Kaufman in Genetic Engineering , ed. J. Setlow (Plenum Press, New York) , 
Vol. 9 (1987); Kaufman and Sharp, J. Mol . Biol. . 159:601 (1982); Ringold 
et al., J . Mol . AppI . Genet . . 1:165-175 (1981); Kaufman et al., Mol. Cell 
Biol . , 5:1750-1759 (1985); Kaetzel and Nilson, J. Biol. Chem. . 263:6244- 

35 6251 (1988); Hung et al . , Proc. Natl. Acad. Sci. USA . 83:261-264 (1986); 
Kaufman et al . , EMBO J. . 6:87-93 (1987); Johnston and Kucey, Science . 
242:1551-1554 (1988); Urlaub et al . , Cell . 33:405-412 (1983). 

To extend the DHFR amplif icat ion method to other cell types, a mutant 
DHFR gene that encodes a protein with reduced sensitivity to methotrexate 

40 may be used in conjunction with host cells that contain normal numbers of 
an endogenous wild- type DHFR gene. Simonsen and Levinson, Proc . Natl . 
Acad. Sci. USA . 80:2495 (1983); Wigler et al . , Proc. Natl. Acad. Sci. USA . 
77:3567-3570 (1980) ; Haber and Schimke, Somatic Cell Genetics . 8:499-508 
(1982) . 

4 5 Alternatively, host cells may be co-transf ected with the product 

gene, a DHFR gene, and a dominant selectable gene, such as a neo r gene. Kim 
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and Wold, Cell , 42:129 (1985); Capon et al . , U.S. Pat. No. 4,965,199. 
Transfectants are identified by first culturing the cells in culture medium 
containing neomycin (or the related drug G418) , and the transfectants so 
identified then are selected for amplification of the DHFR gene and the 
5 product gene by exposure to successively increasing amounts of Mtx. 

As will be appreciated from this discussion, the selection of 
recombinant host cells that express high levels of a desired protein 
generally is a multi-step process. In the first step, initial 
transfectants are selected that have incorporated the product gene and the 

10 selectable gene. In subsequent steps, the initial transfectants are 
subject to further selection for high-level expression of the selectable 
gene and then random screening for high-level expression of the product 
gene. To identify cells expressing high levels of the desired protein, 
typically one must screen large numbers of transfectants . The majority of 

15 transfectants produce less than maximal levels of the desired protein. 
Further, Mtx resistance in DHFR transf ortnants is at least partially 
conferred by varying degrees of gene amplification. Schimke, Cell . 37:705- 
713 (1984) . The inadequacies of co-expression of the non-selected gene 
have been reported by Wold et al . , Proc. Natl. Acad. Sci . USA . 76:5684-5688 

20 (1979) . Instability of the amplified DNA is reported by Kaufman and 
Schimke, Mol. Cell Biol., 1:1069-1076 (1981); Haber and Schimke, Cell . 
26:355-362 (1981); and Fedespiel et al . , J. Biol. Chem. . 259:912 7-9140 
(1984) . 

Several methods have been described for directly selecting such 

25 recombinant host cells in a single step. One strategy involves co- 
transfecting host cells with a product gene and a DHFR gene, and selecting 
those cells that express high levels of DHFR by directly culturing in 
medium containing a high concentration of Mtx. Many of the cells selected 
in that manner also express the co- transf ected product gene at high levels. 

30 Page and Sydenham, Bio /Technology, 9:64 (1991) . This method for single-step 
selection suffers from certain drawbacks that limit its usefulness. High- 
expressing cells obtained by direct culturing in medium containing a high 
level of a selection agent may have poor growth and stability 
characteristics, thus limiting their usefulness for long-term production 

35 processes. Page and Snyderman, Bio/Technolocrv. 9:64 (1991). Single-step 
selection for high-level resistance to Mtx may produce cells with an 
altered, Mtx-resistant DHFR enzyme, or cells that have altered Mtx 
transport properties, rather than cells containing amplified genes . Haber 
et al - J- Biol. Chem., 256:9S01 (1981); Assaraf and Schimke, Proc. Natl. 

40 Acad. Sci. USA . 84:7154 (1987). 

Another method involves the use of polycistronic mRNA expression 
vectors containing a product gene at the 5' end of the transcribed region 
and a selectable gene at the 3' end. Because translation of the selectable 
gene at the 3' end of the polycistronic mRNA is inefficient, such vectors 

45 exhibit preferential translation of the product gene and require high 
levels of polycistronic mRNA to survive selection. Kaufman, Meth. 
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Enzvmol . , 185:487 (1990); Kaufman, Meth. Enzvmol , . 185:537 (1990); Kaufman 
et al.. EMBO J . . 6:187 (1987). Accordingly, cells expressing high levels 
of the desired protein product may be obtained in a single step by 
culturing the initial transf ectants in medium containing a selection agent 
5 appropriate for use with the particular selectable gene. However, the 
utility of these vectors is variable because of the unpredictable influence 
of the upstream product reading frame on selectable marker translation and 
because the upstream reading frame sometimes becomes deleted during 
methotrexate amplification (Kaufman et al., J. Mol . Biol . . 159:601-621 
10 [1982] ; Levinson, Methods in Enzymology , San Diego: Academic Press, Inc. 
[1990] ) . Later vectors incorporated an internal translation initiation site 
derived from members of the picornavirus family which is positioned between 
the product gene and the selectable gene (Pelletier et al . , Nature, 334:320 
[1988]; Jang et al . , J. Virol. , 63:1651 [1989] ) . 
15 A third method for single -step selection involves use of a DNA 

construct with a selectable gene containing an intron within which is 
located a gene encoding the protein of interest. See U.S. Patent No. 
5,043,270 and Abrams et al . , J . Biol . Chem . , 264(24): 14016-14021 (1989) . 
In yet another single- step selection method, host cells are co- transf ected 
20 with an intron -modified selectable gene and a gene encoding the protein of 
interest. See WO 92/17566, published October 15, 1992. The intron- 
modified gene is prepared by inserting into the transcribed region of a 
selectable gene an intron of such length that the intron is correctly 
spliced from the corresponding mRNA precursor at low efficiency, so that 
25 the amount of selectable marker produced from the intron -modified 
selectable gene is substantially less than that produced from the starting 
selectable gene. These vectors help to insure the integrity of the 
integrated DNA construct, but transcriptional linkage is not achieved as 
selectable gene and the protein gene are driven by separate promoters. 
30 Other mammalian expression vectors that have single transcription 

units have been described. Retroviral vectors have been constructed (Cepko 
et aJ., Cell, 37.-1053-1062 [1984]) in which a cDNA is inserted between the 
endogenous Moloney murine leukemia virus (M-MULV) splice donor and splice 
acceptor sites which are followed by a neomycin resistance gene. This 

3 5 vector has been used to express a variety of gene products following 

retroviral infection of several cell types. 

With the above drawbacks in mind, it is one object of the present 
invention to increase the level of homogeneity with regard to expression 
levels of stable clones transf ected with a product gene of interest, by 

4 0 expressing a selectable marker (DHFR) and the protein of interest from a 

single promoter. 

It is another object to provide a method for selecting stable, 
recombinant host cells that express high levels of a desired protein 
product, which method is rapid and convenient to perform, and reduces the 
45 numbers of transf ected cells which need to be screened. Furthermore, it is 
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an object to allow high levels of single and two unit polypeptides to be 
rapidly generated from clones or pools of stable host cell trans fee tants . 

It is an additional object to provide expression vectors which bias 
for active integration events (i.e. have an increased tendency to generate 
5 transf ormants wherein the DNA construct is inserted into a region of the 
genome of the host cell which results in high level expression of the 
product gene) and can accommodate a variety of product genes without the 
need for modification. 

10 SUMMARY OF THE INVENTION 

Accordingly, the present invention is directed to a DNA construct 
(DNA molecule) alternative terminology comprising a 5' transcriptional 
initiation site and a 3' transcriptional termination site, a selectable 
gene (preferably an amplifiable gene) and a product gene provided 3' to the 

15 selectable gene, a transcriptional regulatory region regulating 
transcription of both the selectable gene and the product gene, the 
selectable gene positioned within an intron defined by a splice donor site 
and a splice acceptor site. The splice donor site preferably comprises an 
effective splice donor sequence as herein defined and thereby regulates 

20 expression of the product gene using the transcriptional regulatory region. 

In another embodiment, the invention provides a method for producing 
a product of interest comprising culturing a eukaryotic cell which has been 
transfected with the DNA construct described above, so as to express the 
product gene and recovering the product. 

25 In a further embodiment, the invention provides a method for 

producing eukaryotic cells having multiple copies of the product gene 
comprising transfecting eukaryotic cells with the DNA construct described 
above (where the selectable gene is an amplifiable gene) , growing the cells 
in a selective medium comprising an amplifying agent for a sufficient time 

3 0 for amplification to occur, and selecting cells having multiple copies of 
the product gene. Preferably transfection of the cells is achieved using 
electroporation . 

After transfection of the host cells, most of the transf ectants fail 
to exhibit the selectable phenotype characteristic of the protein encoded 

3 5 by the selectable gene, but surprisingly a small proportion of the 

transf ectants do exhibit the selectable phenotype, and among those 
transf ectants, the majority are found to express high levels of the desired 
product encoded by the product gene. Thus, the invention provides an 
improved method for the selection of recombinant host cells expressing high 

4 0 levels of a desired product, which method is useful with a wide variety of 

eukaryotic host cells and avoids the problems inherent in existing cell 
selection technology. 



-6- 



BNSDOCID: <WO 9604391 A 1_l_> 



WO 96/04391 



PCT/US95/09576 



BRIEF DESCRIPTION OF THE DRAWINGS 
Figures 1A-1D illustrate schematically various DNA constructs 
encompassed by the instant invention. The large arrows represent the 
selectable gene and the product gene, the V formed by the dashed lines 
5 shows the region of the precursor RNA internal to the 5' splice donor site 
(SD) and 3' splice acceptor site (SA) that is excised from vectors that 
contain a functional SD. The transcriptional regulatory region, selectable 
gene, product gene and transcriptional termination site are depicted in 
Figure 1A. Figure IB depicts the DNA constructs of Example 1- The various 
10 splice donor sequences are depicted, i.e., wild type ras splice donor 
sequence <WT ras) , mutant ras splice donor sequence (MUTANT ras) and non- 
functional splice donor sequence (aGT) . The probes used for Northern blot 
analysis in Example 1 are shown in Figure IB. Figure 1C depicts the DNA 
constructs of Example 2 and Figure ID depicts the DNA construct of Example 
15 3 used for expression of anti-IgE V„. 

Figure 2 depicts schematically the control DNA construct used in 
Example 1 . 

Figures 3A-Q depict the nucleotide sequence (SEQ ID NO: 1) of the 
DHFR/intron- (WT ras SD) -tPA expression vector of Example 1. 

20 Figure 4 is a bar graph which shows the number of colonies that form 

in selective medium after electroporation of linearized duplicate miniprep 
DNA' s prepared in parallel from the three vectors shown in Figure IB (i.e. 
with wild type ras splice donor sequence [WT ras] , mutant ras splice donor 
sequence [MUTANT ras] and non- functional splice donor sequence [aGT] ) and 

25 from the control vector that has DHFR under control of SV40 promoter and 
tPA under control of CMV promoter (see Figure 2) . Cells were selected in 
nucleoside free medium and counted with an automated colony counter. 

Figures 5A-C are bar graphs depicting expression of tPA from stable 
pools and clones generated from the vectors shown in Figure IB. In Figure 

30 5A greater than 100 clones from each vector transf ection were mixed, plated 
in 24 well plates, and assayed by tPA ELISA at "saturation". In Figure 5B, 
twenty clones chosen at random derived from each of the vectors were 
assayed by tPA ELISA at "saturation" . In Figure 5C, the pools mentioned in 
Figure 5A (except the aGT pool) were exposed to 200nM Mtx to select for 

35 DHFR amplification and then pooled and assayed for tPA expression. 

Figures 6A-P depict the nucleotide sequence (SEQ ID NO: 2) of the 
DHFR/intron- (WT ras SD) -TNFr-IgG expression vector of Example 2. 

Figures 7A-B are bar graphs depicting expression of TNFr-IgG using 
dicistronic or control vectors (see Example 2) . Vectors containing TNFr- 

40 IgG (but otherwise identical to those described for tPA expression in 
Example 1) were constructed (see Figure 1C) , introduced into dpl2.CH0 cells 
by electroporation, pooled, and assayed for product expression before 
(Figure 7A) and after (Figure 7B) being subjected to amplification in 2 00nM 
Mtx. 

4 5 Figure 8 depicts schematically the DNA construct used for expression 

of the v L of anti-IgE in Example 3. 

-7 - 
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Figures 9A-0 depict the nucleotide sequence (SEQ ID NO: 3) of the 
anti-IgE V M expression vector of Example 3. 

Figures 10A-Q depict the nucleotide sequence (SEQ ID NO: 4) of the 
anti-IgE V L expression vector of Example 3. 
5 Figure 11 is a bar graph depicting anti-IgE expression in Example 3. 

Heavy (V H ) and light <VJ chain expression vectors were constructed, co- 
electroporated into CHO cells, clones were selected and assayed for 
antibody expression. Additionally, pools were established and assessed 
with regard to expression before and after Mtx selection at 200nM and 1/xM. 

10 DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Definitions : 

The "DNA construct" disclosed herein comprises a non-naturally 
occurring DNA molecule which can either be provided as an isolate or 
integrated in another DNA molecule e.g. in an expression vector or the 

15 chromosome of an eukaryotic host cell . 

The term "selectable gene" as used herein refers to a DNA that 
encodes a selectable marker necessary for the growth or survival of a host 
cell under the particular cell culture conditions chosen. Accordingly, a 
host cell that is transformed with a selectable gene will be capable of 

20 growth or survival under certain cell culture conditions wherein a non- 
transfected host cell is not capable of growth or survival. Typically, a 
selectable gene will confer resistance to a drug or compensate for a 
metabolic or catabolic defect in the host cell. Examples of selectable 
genes are provided in the following table. See also Kaufman, Methods in 

25 Enzvmology . 185: 537-566 (1990), for a review of these. 



TABLE 1 

Selectable Genes and their Selecti fwi g^ggnfcs 



Selection Agent 


Selectable Gene 


Methotrexate 


Dihydrof olate reductase 


Cadmium 


Metal lothionein 


PALA 


CAD 


Xyl-A-or adenosine and 2 ' - 
deoxycof ormycin 


Adenosine deaminase 


Adenine, azaserine, and 
co f ormycin 


Adenylate deaminase 


6-Azauridine, pyrazofuran 


UMP Synthetase 


Mycophenolic acid 


IMP 5 ' - dehydrogenase 
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Mycophenolic acid with 
limit ino xanthine 


Xanthine -guanine 
phosphor ibosyl transferase 


Hypoxanthine f aminopterin, 
and thymidine (HAT) 


Mutant HGPRTase or mutant 
thymidine kinase 


5 - Fluorodeoxyuridine 


Thymidylate synthetase 


Multiple drugs e.g. 
adriamycin, vincristine or 
colchicine 


P -glycoprotein 17 0 


ADhidicol in 


Ribonucleotide reductase 


Methionine sulf oximine 


Glutamine synthetase 


i3 — Asoart vl hvdrnxamAte nr 
Albizziin 


Asria Tacrine svrith^fca*?^ 


Canavanine 


Arginosuccinate synthetase 


or - Di f luorome thy 1 ornithine 


Ornithine decarboxylase 


Compact in 


HMG-CoA reductase 


Tunicamycin 


W-Acetylglucosaminyl 
transferase 


Borrelidin 


Threonyl - tRNA synthetase 


Ouabain 


Na*K* - ATPa s e 



The preferred selectable gene is an amplif iable gene . As used herein, 

20 the term "amplif iable gene" refers to a gene which is amplified (i.e. 
additional copies of the gene are generated which survive in 
intrachromosomal or extra chromosomal form) under certain conditions . The 
amplif iable gene usually encodes an enzyme (i.e. an amplif iable marker) 
which is required for growth of eukaryotic cells under those conditions. 

25 For example, the gene may encode DHFR which is amplified when a host cell 
transformed therewith is grown in Mtx. According to Kaufman, the selectable 
genes in Table 1 above can also be considered amplif iable genes . An example 
of a selectable gene which is generally not considered to be an amplif iable 
gene is the neomycin resistance gene (Cepko et al . , supra) . 

30 As used herein, "selective medium" refers to nutrient solution used 

for growing eukaryotic cells which have the selectable gene and therefore 
includes a "selection agent". Commercially available media such as Ham's 
F10 (Sigma) , Minimal Essential Medium ( [MEM] , Sigma) , RPMI-1640 (Sigma) , 
and Dulbecco's Modified Eagle's Medium ( [DMEM] , Sigma) are exemplary 

35 nutrient solutions. In addition, any of the media described in Ham and 
Wallace, Meth . Enz . . 58:44 (1979), Barnes and Sato, Anal . Biochem. . 102:255 
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(1980), U.S. Patent NOS. 4,767,704; 4,657,866; 4,927,762; or 4,560,655; WO 
90/03430; WO 87/00195; U.S. Patent Re. 30,985; or U.S. Patent No. 
5,122,469, the disclosures of all of which are incorporated herein by 
reference, may be used as culture media. Any of these media may be 
5 supplemented as necessary with hormones and/or other growth factors (such 
as insulin, transferrin, or epidermal growth factor) , salts (such as sodium 
chloride, calcium, magnesium, and phosphate) , buffers (such as HE PES ) , 
nucleosides (such as adenosine and thymidine) , antibiotics (such as 
Gent amy cir?* drug) , trace elements (defined as inorganic compounds usually 

10 present at final concentrations in the micromolar range) , and glucose or 
an equivalent energy source . Any other necessary supplements may also be 
included at appropriate concentrations that would be known to those skilled 
in the art. The preferred nutrient solution comprises fetal bovine serum. 

The term "selection agent" refers to a substance that interferes with 

15 the growth or survival of a host cell that is deficient in a particular 
selectable gene. Examples of selection agents are presented in Table 1 
above. The selection agent preferably comprises an "amplifying agent" which 
is defined for purposes herein as an agent for amplifying copies of the 
amplifiable gene, such as Mtx if the amplifiable gene is DHFR. See Table 

20 1 for examples of amplifying agents . 

As used herein, the term "transcriptional initiation site" refers to 
the nucleic acid in the DNA construct corresponding to the first nucleic 
acid incorporated into the primary transcript, i.e., the mRNA precursor, 
which site is generally provided at, or adjacent to, the 5' end of the DNA 

25 construct. 

The term "transcriptional termination site" refers to a sequence of 
DNA, normally represented at the 3' end of the DNA construct, that causes 
RNA polymerase to terminate transcription. 

As used herein, "transcriptional regulatory region" refers to a 

30 region of the DNA construct that regulates transcription of the selectable 
gene and the product gene. The transcriptional regulatory region normally 
refers to a promoter sequence (i.e. a region of DNA involved in binding of 
RNA polymerase to initiate transcription) which can be constitutive or 
inducible and, optionally, an enhancer (i.e. a cis-acting DNA element, 

35 usually from about 10-300 bp, that acts on a promoter to increase its 
transcription) . 

As used herein, "product gene" refers to DNA that encodes a desired 
protein or polypeptide product. Any product gene that is capable of 
expression in a host cell may be used, although the methods of the 

4 0 invention are particularly suited for obtaining high-level expression of 
a product gene that is not also a selectable or amplifiable gene. 
Accordingly, the protein or polypeptide encoded by a product gene typically 
will be one that is not necessary for the growth or survival of a host cell 
under the particular cell culture conditions chosen. For example, product 

45 genes suitably encode a peptide, or may encode a polypeptide sequence of 
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amino acids for which the chain length is sufficient to produce higher 
levels of tertiary and/or quaternary structure. 

Examples of bacterial polypeptides or proteins include, e.g., 
alkaline phosphatase and /8-lactamase . Examples of mammalian polypeptides 
5 or proteins include molecules such as renin; a growth hormone, including 
human growth hormone, and bovine growth hormone; growth hormone releasing 
factor; parathyroid hormone; thyroid stimulating hormone; lipoproteins; 
alpha- 1 -antitrypsin; insulin A- chain; insulin B- chain ; proinsulin,- follicle 
stimulating hormone; calcitonin; luteinizing hormone; glucagon; clotting 

10 factors such as factor VIIIC, factor IX, tissue factor, and von Willebrands 
factor; anti- clotting factors such as Protein C; atrial natriuretic factor ,- 
lung surfactant; a plasminogen activator, such as urokinase or human urine 
or tissue -type plasminogen activator (t-PA) ; bombesin; thrombin; 
hemopoietic growth factor; tumor necrosis factor -alpha and -beta; 

15 enkephalinase; RANTES (regulated on activation normally T-cell expressed 
and secreted) ; human macrophage inflammatory protein (MIP-1-alpha) ; a serum 
albumin such as human serum albumin; mullerian- inhibiting substance; 
relaxin A- chain; relaxin B- chain; prorelaxin; mouse gonadotropin- associated 
peptide; a microbial protein, such as beta - lactamase ; DNase; inhibin,- 

20 activin; vascular endothelial growth factor (VEGF) ; receptors for hormones 
or growth factors; integrin; protein A or D; rheumatoid factors; a 
neurotrophic factor such as bone-derived neurotrophic factor (BDNF) , 
neurotrophic 3 , -4, -5, or -6 <NT-3, NT -4 , NT-5, or NT- 6) , or a nerve 
growth factor such as NGF-0; platelet-derived growth factor (PDGF) ; 

25 fibroblast growth factor such as aFGF and bFGF; epidermal growth factor 
(EGF) ; transforming growth factor (TGF) such as TGF-alpha and TGF-beta, 
including TGF-01, TGF-02, TGF- |S3 , TGF-04, or TGF-/35; insulin-like growth 
factor-I and -II (IGF-I and IGF-II) ; des (1-3 > -IGF-I (brain IGF-I) , insulin- 
like growth factor binding proteins; CD proteins such as CD-3, CD-4, CD-8, 

30 and CD-19; erythropoietin; osteoinductive factors; immunotoxins ; a bone 
morphogenetic protein (BMP) ; an interferon such as interferon- alpha, -beta, 
and -gamma; colony stimulating factors (CSFs) , e.g., M-CSF, GM-CSF, and G- 
CSF; interleukins (ILs) , e.g., IL-l to IL-10; superoxide dismutase; T-cell 
receptors; surface membrane proteins; decay accelerating factor; viral 

35 antigen such as, for example, a portion of the AIDS envelope; transport 
proteins; homing receptors; addressins; regulatory proteins; antibodies; 
chimeric proteins such as immunoadhesins and fragments of any of the above - 
listed polypeptides. 

The product gene preferably does not consist of an anti-sense 

4 0 sequence for inhibiting the expression of a gene present in the host. 
Preferred proteins herein are therapeutic proteins such as TGF-/3, TGF-a, 
PDGF, EGF, FGF , IGF-I, DNase, plasminogen activators such as t-PA, clotting 
factors such as tissue factor and factor VIII, hormones such as relaxin and 
insulin, cytokines such as TFN-y , chimeric proteins such as TNF receptor 

4 5 IgG immunoadhesin (TNFr-IgG) or antibodies such as anti-IgE. 
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The term "intron" as used herein refers to a nucleotide sequence 
present within the transcribed region of a gene or within a messenger RNA 
precursor, which nucleotide sequence is capable of being excised, or 
spliced, from the messenger RNA precursor by a host cell prior to 
5 translation. Introns suitable for use in the present invention are 
suitably prepared by any of several methods that are well known in the art, 
such as purification from a naturally occurring nucleic acid or de novo 
synthesis . The introns present in many naturally occurring eukaryotic 
genes have been identified and characterized. Mount, Nuc . Acids Res . , 

10 10:459 (1982) . Artificial introns comprising functional splice sites also 
have been described. Winey et al., Mol . Cell Biol . , 9:329 (1989); 
Gatermann et al . , Mol . Cell Biol . , 9:1526 (1989) . Introns may be obtained 
from naturally occurring nucleic acids, for example, by digestion of a 
naturally occurring nucleic acid with a suitable restriction endonuclease, 

15 or by PCR cloning using primers complementary to sequences at the 5' and 
3' ends of the intron. Alternatively, introns of defined sequence and 
length may be prepared synthetically using various methods in organic 
chemistry. Narang et al . , Meth. Enzvmol . , 68:90 (1979); Caruthers et al . , 
Meth . Enzvmol . , 154:287 (1985); Froehler et al . . Nuc. Acids Res. . 14:5399 

20 (1986) . 

As used herein "splice donor site" or " SD " refers to the DNA sequence 
immediately surrounding the exon- intron boundary at the 5' end of the 
intron,- where the "exon" comprises the nucleic acid 5' to the intron. Many 
splice donor sites have been characterized and Ohshima et al . , J . Mol . 
25 Biol . , X95: 247-259 (1987) provides a review of these. An "efficient splice 
donor sequence" refers to a nucleic acid sequence encoding a splice donor 
site wherein the efficiency of splicing of messenger RNA precursors having 
the splice donor sequence is between about 80 to 99% and preferably 90 to 
95% as determined by quantitative PCR. Examples of efficient splice donor 

3 0 sequences include the wild type (WT) ras splice donor sequence and the 

GAC : GTAAGT sequence of Example 3. Other efficient splice donor sequences 
can be readily selected using the techniques for measuring the efficiency 
of splicing disclosed herein. 

The terms "PCR" and "polymerase chain reaction" as used herein refer 
35 to the in vitro amplification method described in US Patent No. 4,683,195 
{issued July 28, 1987). In general, the PCR method involves repeated 
cycles of primer extension synthesis, using two DNA primers capable of 
hybridizing preferentially to a template nucleic acid comprising the 
nucleotide sequence to be amplified. The PCR method can be used to clone 

4 0 specific DNA sequences from total genomic DNA, cDNA transcribed from 

cellular RNA, viral or plasmid DNAs . Wang & Mark, in PCR Protocols , pp. 70- 
75 (Academic Press, 1990) ; Scharf, in PCR Protocols , pp. 84-98; Kawasaki 
& Wang, in PCR Technology , pp. 89-97 (Stockton Press, 1989) . Reverse 
transcription-polymerase chain reaction (RT-PCR) can be used to analyze RNA 
45 samples containing mixtures of spliced and unspliced mRNA transcripts. 
Fluorescently tagged primers designed to span the intron are used to 
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amplify both spliced and unspliced targets. The resultant amplification 
products are then separated by gel electrophoresis and quantitated by 
measuring the fluorescent emission of the appropriate band(s) . A 
comparison is made to determine the amount of spliced and unspliced 
5 transcripts present in the RNA sample . 

One preferred splice donor sequence is a "consensus splice donor 
sequence". The nucleotide sequences surrounding intron splice sites, which 
sequences are evolutionarily highly conserved, are referred to as 
"consensus splice donor sequences". In the mRNAs of higher eukaryotes, the 

10 5' splice site occurs within the consensus sequence AG : GUAAGU (wherein the 
colon denotes the site of cleavage and ligation) - In the mRNAs of yeast, 
the 5' splice site is bounded by the consensus sequence : GUAUGU . Padgett, 
et al. ( Ann. Rev. Biochem. , 55:1119 (1986). 

The expression "splice acceptor site" or "SA" refers to the sequence 

15 immediately surrounding the intron-exon boundary at the 3' end of the 
intron, where the "exon" comprises the nucleic acid 3' to the intron. Many 
splice acceptor sites have been characterized and Ohshima et al . , J. Mol . 
Biol . . 195:247-259 (1987) provides a review of these. The preferred splice 
acceptor site is an efficient splice acceptor site which refers to a 

20 nucleic acid sequence encoding a splice acceptor site wherein the 
efficiency of splicing of messenger RNA precursors having the splice 
acceptor site is between about 80 to 99% and preferably 90 to 95% as 
determined by quantitative PCR. The splice acceptor site may comprise a 
consensus sequence. In the mRNAs of higher eukaryotes, the 3' splice 

25 acceptor site occurs within the consensus sequence (U/C) n NCAG:G. In the 
mRNAs of yeast, the 3' acceptor splice site is bounded by the consensus 
sequence (C/U)AG: . Padgett, et al . , supra. 

As used herein "culturing for sufficient time to allow amplification 
to occur" refers to the act of physically culturing the eukaryotic host 

30 cells which have been transformed with the DNA construct in cell culture 
media containing the amplifying agent, until the copy number of the 
amplif iable gene (and preferably also the copy number of the product gene) 
in the host cells has increased relative to the transformed cells prior to 
this culturing. 

3 5 The term "expression" as used herein refers to transcription or 

translation occurring within a host cell . The level of expression of a 
product gene in a host cell may be determined on the basis of either the 
amount of corresponding mRNA that is present in the cell or the amount of 
the protein encoded by the product gene that is produced by the cell . For 

40 example, mRNA transcribed from a product gene is desirably quantitated by 
northern hybridization. Sambrook, et al . , Molecular Cloning: A Laboratory 
Manual, pp. 7.3-7.57 (Cold Spring Harbor Laboratory Press, 1989). Protein 
encoded by a product gene can be quantitated either by assaying for the 
biological activity of the protein or by employing assays that are 

45 independent of such activity, such as western blotting or radioimmunoassay 
using antibodies that are capable of reacting with the protein. Sambrook, 
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et al.. Molecular Cloning; A Laboratory Manual , pp. 18.1-18.88 (Cold Spring 
Harbor Laboratory Press, 1989) . 

Modes for Carrying Out the Invention 

Methods and compositions are provided for enhancing the stability 
5 and/or copy number of a transcribed sequence in order to allow for elevated 
levels of a RNA sequence of interest. In general, the methods of the 
present invention involve transfecting a eukaryotic host cell with an 
expression vector comprising both a product gene encoding a desired 
polypeptide and a selectable gene (preferably an amplif iable gene) . 

10 Selectable genes and product genes may be obtained from genomic DNA, 

cDNA transcribed from cellular RNA, or by in vitro synthesis. For example, 
libraries are screened with probes (such as antibodies or oligonucleotides 
of about 20-80 bases) designed to identify the selectable gene or the 
product gene (or the protein (s) encoded thereby) . Screening the cDNA or 

15 genomic library with the selected probe may be conducted using standard 
procedures as described in chapters 10-12 of Sambrook ec al . , Molecular 

Cloning : A Laboratory Manual (New York: Cold Spring Harbor Laboratory 

Press, 1989) . An alternative means to isolate the selectable gene or 
product gene is to use PCR methodology as described in section 14 of 

20 Sambrook et al . , supra. 

A preferred method of practicing this invention is to use carefully 
selected oligonucleotide sequences to screen cDNA libraries from various 
tissues known to contain the selectable gene or product gene. The 
oligonucleotide sequences selected as probes should be of sufficient length 

25 and sufficiently unambiguous that false positives are minimized. 

The oligonucleotide generally is labeled such that it can be detected 
upon hybridization to DNA in the library being screened. The preferred 
method of labeling is to use 32 P- labeled ATP with polynucleotide kinase, 
as is well known in the art, to radiolabel the oligonucleotide. However, 

30 other methods may be used to label the oligonucleotide, including, but not 
limited to, biotinylation or enzyme labeling. 

Sometimes, the DNA encoding the selectable gene and product gene is 
preceded by DNA encoding a signal sequence having a specific cleavage site 
at the N- terminus of the mature protein or polypeptide. In general, the 

3S signal sequence may be a component of the expression vector, or it may be 
a part of the selectable gene or product gene that is inserted into the 
expression vector. If a heterologous signal sequence is used, it 
preferably is one that is recognized and processed (i.e., cleaved by a 
signal peptidase) by the host cell . For yeast secretion the native signal 

40 sequence may be substituted by, e.g., the yeast invertase leader, alpha 
factor leader (including Saccharomyces and Kluyveromyces ct- factor leaders, 
the latter described in U.S. Pat. No. 5,010,162 issued 23 April 1991), or 
acid phosphatase leader, the C. albicans glucoamylase leader (EP 362,179 
published 4 April 1990) , or the signal described in WO 90/13646 published 

45 15 November 1990. In mammalian cell expression the native signal sequence 
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of the protein of interest is satisfactory, although other mammalian signal 
sequences may be suitable, such as signal sequences from secreted 
polypeptides of the same or related species, as well as viral secretory 
leaders, for example, the herpes simplex gD signal- The DNA for such 
5 precursor region is ligated in reading frame to the selectable gene or 
product gene. 

As shown in Figure 1A # the selectable gene is generally provided at 
the 5 ' end of the DNA construct and this selectable gene is followed by the 
product gene. Therefore, the full length (non-spiced) message will contain 

10 DHFR as the first open reading frame and will therefore generate DHFR 
protein to allow selection of stable transf ectants . The full length message 
is not expected to generate appreciable amounts of the protein of interest 
as the second AUG in a dicistronic message is an inefficient initiator of 
translation in mammalian cells (Kozak, J. Cell Biol . , 115: 887-903 [1991]). 

15 The selectable gene is positioned within an intron. Introns are 

noncoding nucleotide sequences, normally present within many eukaryotic 
genes, which are removed from newly transcribed mRNA precursors in a 
multiple-step process collectively referred to as splicing. 

A single mechanism is thought to be responsible for the splicing of 

20 mRNA precursors in mammalian, plant, and yeast cells. In general, the 
process of splicing requires that the 5' and 3' ends of the intron be 
correctly cleaved and the resulting ends of the mRNA be accurately joined, 
such that a mature mRNA having the proper reading frame for protein 
synthesis is produced. Analysis of a variety of naturally occurring and 

25 synthetically constructed mutant genes has shown that nucleotide changes 
at many of the positions within the consensus sequences at the 5' and 3' 
splice sites have the effect of reducing or abolishing the synthesis of 
mature mRNA. Sharp, Science . 235:766 (19B7) ; Padgett, et al . , Ann . Rev . 
Biochem. , 55:1119 (1986); Green, Ann. Rev. Genet . . 20:671 (1986). 

30 Mutational studies also have shown that RNA secondary structures involving 
splicing sites can affect the efficiency of splicing. Solnick, Cell , 
43:667 (1985); Konarska, et al . , Cell , 42:165 (1985). 

The length of the intron may also affect the efficiency of splicing. 
By making deletion mutations of different sizes within the large intron of 

35 the rabbit beta -globin gene , Wieringa, et al . determined that the minimum 
intron length necessary for correct splicing is about 69 nucleotides. 
Cell , 37:915 (1984) . Similar studies of the intron of the adenovirus E1A 
region have shown that an intron length of about 78 nucleotides allows 
correct splicing to occur, but at reduced efficiency. Increasing the 

4 0 length of the intron to 91 nucleotides restores normal splicing efficiency, 
whereas truncating the intron to 63 nucleotides abolishes correct splicing. 
Ulfendahl, et al . , Nuc. Acids Res. . 13:6299 (1985). 

To be useful in the invention, the intron must have a length such 
that splicing of the intron from the mRNA is efficient. The preparation of 

45 introns of differing lengths is a routine matter, involving methods well 
known in the art, such as de novo synthesis or in vitro deletion 
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mutagenesis of an existing intron. Typically, the intron will have a length 
of at least about 150 nucleotides, since introns which are shorter than 
this tend to be spliced less efficiently. The upper limit for the length 
of the intron can be up to 30 kB or more. However, as a general 
5 proposition, the intron is generally less than about 10 kB in length. 

The intron is modified to contain the selectable gene not normally 
present within the intron using any of the various known methods for 
modifying a nucleic acid in vitro. Typically, a selectable gene will be 
introduced into an intron by first cleaving the intron with a restriction 

10 endonuclease, and then covalently joining the resulting restriction 
fragments to the selectable gene in the correct orientation for host cell 
expression, for example by ligation with a DNA ligase enzyme. 

The DNA construct is dicistronic, i.e. the selectable gene and 
product gene are both under the transcriptional control of a single 

15 transcriptional regulatory region. As mentioned above, the transcriptional 
regulatory region comprises a promoter. Suitable promoting sequences for 
use with yeast hosts include the promoters for 3-phosphoglycerate kinase 
(Hitzeman et aJ . , J. Biol. Chem. . 255:2073 [1980]) or other glycolytic 
enzymes (Hess et al . , J. Adv. Enzyme Reo. . 7:149 [1968]; and Holland, 

20 Biochemistry, 17:4900 [1978]), such as enolase, glyceraldehyde- 3 -phosphate 
dehydrogenase, hexokinase, pyruvate decarboxylase, phosphof ructokinase, 
glucose-6 -phosphate isomerase, 3-phosphoglycerate mutase , pyruvate kinase, 
triosephosphate isomerase, phosphoglucose isomerase , and glucokinase. 

Other yeast promoters, which are inducible promoters having the 

25 additional advantage of transcription controlled by growth conditions, are 
the promoter regions for alcohol dehydrogenase 2, isocytochrome C, acid 
phosphatase, degradative enzymes associated with nitrogen metabolism, 
metallothionein, glyceraldehyde- 3 -phosphate dehydrogenase, and enzymes 
responsible for maltose and galactose utilization. Suitable vectors and 

30 promoters for use in yeast expression are further described in Hitzeman et 
al., EP 73,657A. Yeast enhancers also are advantageously used with yeast 
promoters . 

Expression control sequences are known for eukaryotes . Virtually all 
eukaryotic genes have an AT -rich region located approximately 25 to 30 

35 bases upstream from the site where transcription is initiated. Another 
sequence found 70 to 8 0 bases upstream from the start of transcription of 
many genes is a CXCAAT region where X may be any nucleotide. 

Product gene transcription from vectors in mammalian host cells is 
controlled by promoters obtained from the genomes of viruses such as 

40 polyoma virus, fowlpox virus <UK 2,211,504 published 5 July 1989), 
adenovirus (such as Adenovirus 2), bovine papilloma virus, avian sarcoma 
virus, cytomegalovirus, a retrovirus, hepatitis -B virus and most preferably 
Simian Virus 40 (SV40) , from heterologous mammalian promoters, e.g. the 
actin promoter or an immunoglobulin promoter, from heat -shock promoters, 

4 5 and from the promoter normally associated with the product gene, provided 
such promoters are compatible with the host cell systems. 
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The early and late promoters of the SV4 0 virus are conveniently 
obtained as an SV4 0 restriction fragment that also contains the SV40 viral 
origin of replication. Fiers et al . , Nature, 273:113 (1978); Mulligan and 
Berg, Science , 209:1422-1427 (1980); Pavlakis et al . , Proc. Natl. Acad. 
5 Sci . USA , 78:7398-7402 (1981). The immediate early promoter of the human 
cytomegalovirus (CMV) is conveniently obtained as a Hindi I I E restriction 
fragment. Greenaway et al . , Gene . 18:355-360 (1982). A system for 
expressing DNA in mammalian hosts using the bovine papilloma virus as a 
vector is disclosed in U.S. 4,419,446. A modification of this system is 

10 described in U.S. 4,601,978. See also Gray et al . , Nature , 295:503-508 
(1982) on expressing cDNA encoding immune interferon in monkey cells; , 
Reyes et al . , Nature . 297:598-601 (1982) on expression of human 0- 
interferon cDNA in mouse cells under the control of a thymidine kinase 
promoter from herpes simplex virus, Canaani and Berg, Proc. Natl. Acad. 

15 Sci. USA . 79:5166-5170 (1982) on expression of the human interferon 01 gene 
in cultured mouse and rabbit cells, and Gorman et al., Proc . Natl . Acad . 
Sci. USA , 79:6777-6781 (1982) on expression of bacterial CAT sequences in 
CV-l monkey kidney cells, chicken embryo fibroblasts, Chinese hamster ovary 
cells, HeLa cells, and mouse NIH-3T3 cells using the Rous sarcoma virus 

2 0 long terminal repeat as a promoter. 

Preferably the transcriptional regulatory region in higher eukaryotes 
comprises an enhancer sequence. Enhancers are relatively orientation and 
position independent having been found 5' (Lainins et al., Proc . Natl . 
Acad. Sci. USA , 78:993 [1981]) and 3' (Lusky et al . , Mol . Cell Bio. . 3:1108 

25 [1983]) to the transcription unit, within an intron (Banerji et al . , Cell . 
33:729 [1983]) as well as within the coding sequence itself (Osborne et 
al., Mol . Cell Bio . , 4:1293 [1984]) . Many enhancer sequences are now known 
from mammalian genes (globin, elastase, albumin, a- fetoprotein and 
insulin) . Typically, however, one will use an enhancer from a eukaryotic 

30 cell virus. Examples include the SV40 enhancer on the late side of the 
replication origin (bp 100-270) , the cytomegalovirus early promoter 
enhancer (CMV) , the polyoma enhancer on the late side of the replication 
origin, and adenovirus enhancers. See also Yaniv, Nature, 297:17-18 (1982) 
on enhancing elements for activation of eukaryotic promoters . The enhancer 

35 may be spliced into the vector at a position 5' or 3' to the product gene, 
but is preferably located at a site 5' from the promoter. 

The DNA construct has a transcriptional initiation site following the 
transcriptional regulatory region and a transcriptional termination region 
following the product gene (see Figure 1A) . These sequences are provided 

40 in the DNA construct using techniques which are well known in the art. 

The DNA construct normally forms part of an expression vector which 
may have other components such as an origin of replication (i.e., a nucleic 
acid sequence that enables the vector to replicate in one or more selected 
host cells) and, if desired, one or more additional selectable gene(s). 

4 5 Construction of suitable vectors containing the desired coding and control 
sequences employs standard ligation techniques. Isolated plasmids or DNA 
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fragments are cleaved, tailored, and religated in the form desired to 
generate the plasmids required. 

Generally, in cloning vectors the origin of replication is one that 
enables the vector to replicate independently of the host chromosomal DNA, 
5 and includes origins of replication or autonomously replicating sequences. 
Such sequences are well known. The 2/i plasmid origin of replication is 
suitable for yeast, and various viral origins (SV40, polyoma, adenovirus, 
VSV or BPV) are useful for cloning vectors in mammalian cells. Generally, 
the origin of replication component is not needed for mammalian expression 
10 vectors (the SV4 0 origin may typically be used only because it contains the 
early promoter) . 

Most expression vectors are "shuttle" vectors, i.e., they are capable 
of replication in at least one class of organisms but can be transfected 
into another organism for expression. For example, a vector is cloned in 
15 E. coli and then the same vector is transfected into yeast or mammalian 
cells for expression even though it is not capable of replicating 
independently of the host cell chromosome. 

For analysis to confirm correct sequences in plasmids constructed, 
plasmids from the transf ormants are prepared, analyzed by restriction, 
20 and/or sequenced by the method of Messing et al . , Nucleic Acids Res . . 9:309 
(1981) or by the method of Maxam et al . , Methods in Enzvmoloav. 65:499 
(1980) . 

The expression vector having the DNA construct prepared as discussed 
above is transformed into a eukaryotic host cell. Suitable host cells for 
25 cloning or expressing the vectors herein are yeast or higher eukaryote 
cells . 

Eukaryotic microbes such as filamentous fungi or yeast are suitable 
hosts for vectors containing the product gene. Saccharomyces cerevisiae, 
or common baker's yeast, is the most commonly used among lower eukaryotic 

30 host microorganisms. However, a number of other genera, species, and 
strains are commonly available and useful herein, such as S. pombe [Beach 
and Nurse, Nature , 290:140 (1981)], Kluyveromyces lactis [Louvencourt et 
ai -« J . Bacteriol . , 737 (1983)], yarrowia [EP 402,226], Pichia pastoris [EP 
183,070], Trichoderma reesia (EP 244,234], Neurospora crassa [Case et al . , 

35 Proc. Natl. Ac ad. Sci . USA . 76:5259-5263 (1979)], and Aspergillus hosts 

such as A. nidulans [Ballance et al . , Biochem. Biophys . Res. Commun . . 
112:284-289 (1983); Tilburn et al . , Gene . 26:205-221 (1983); Yelton et al . , 
Proc. Natl. Acad. Sci. USA. 81:1470-1474 (1984)] and A . nlger [Kelly and 
Hynes, EMBO J. , 4:475-479 (1985)]. 

40 Suitable host" cells for the expression of the product gene are 

derived from multicellular organisms. Such host cells are capable of 
complex processing and glycosylation activities . In principle, any higher 
eukaryotic cell culture is workable, whether from vertebrate or 
invertebrate culture. Examples of invertebrate cells include plant and 

45 insect cells. Numerous baculoviral strains and variants and corresponding 
permissive insect host cells from hosts such as Spodoptera frugiperda 
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(caterpillar) , Aedes aegyptl {mosquito) , Aedes albopxctus (mosquito) , 
Drosphlla melanogaster (fruit fly) , and Bombyx mori host cells have been 
identified. See, e.g., Luckow et al . , Bio/Technology , 6:47-55 (1988); 
Miller et al . , in Genetic Engineering . Setlow, J.K. et al . , eds . , Vol. 8 
5 (Plenum Publishing, 1986), pp. 277-279; and Maeda et al . , Nature . 315:592- 
594 (1985) . A variety of such viral strains are publicly available, e.g., 
the L-l variant of Autogrrapha cal i fomica NPV and the Bm-5 strain of Bombyx 
mori NPV, and such viruses may be used as the virus herein according to the 
present invention, particularly for trans fection of Spodoptera frugiperda 
10 cells. 

Plant cell cultures of cotton, corn, potato, soybean, petunia, 
tomato, and tobacco can be utilized as hosts. Typically, plant cells are 
transfected by incubation with certain strains of the bacterium 
Agrobacterium tumefaciens , which has been previously manipulated to contain 

15 the product gene. During incubation of the plant cell culture with A. 
tumefaciens , the product gene is transferred to the plant cell host such 
that it is transfected, and will, under appropriate conditions, express the 
product gene. In addition, regulatory and signal sequences compatible with 
plant cells are available, such as the nopaline synthase promoter and 

2 0 polyadenylation signal sequences. Depicker et al., J. Mol . Appl . Gen. . 
1:561 (1982) . In addition, DNA segments isolated from the upstream region 
of the T-DNA 780 gene are capable of activating or increasing transcription 
levels of plant- expressible genes in recombinant DNA- containing plant 
tissue. EP 321,196 published 21 June 1989. 

2 5 However, interest has been greatest in vertebrate cells, and 

propagation of vertebrate cells in culture (tissue culture) has become a 
routine procedure in recent years [ Tissue Culture , Academic Press, Kruse 
and Patterson, editors (1973)]. Examples of useful mammalian host cell 
lines are monkey kidney CVl line transformed by SV40 (COS- 7, ATCC CRL 

30 1651) ; human embryonic kidney line (293 or 293 cells subcloned for growth 
in suspension culture, Graham et al . , J. Gen Virol . , 36:59 [1977]); baby 
hamster kidney cells (BHK, ATCC CCL 10) ; Chinese hamster ovary cells/ -DHFR 
(CHO, Urlaub and Chasin, Proc. Natl. Acad. Sci . USA . 77:4216 [1980]); 
dpl2.CHO cells (EP 307,247 published 15 March 1989); mouse Sertoli cells 

35 (TM4, Mather, Biol. Reprod. , 23:243-251 [1980]); monkey kidney cells (CVl 
ATCC CCL 70) ; African green monkey kidney cells (VERO-76, ATCC CRL- 1587) ; 
human cervical carcinoma cells (HELA, ATCC CCL 2) ; canine kidney cells 
(MDCK, ATCC CCL 34); buffalo rat liver cells (BRL 3A, ATCC CRL 1442); human 
lung cells (W138, ATCC CCL 75); human liver cells (Hep G2, HB 8065); mouse 

40 mammary tumor (MMT 060562, ATCC CCL51) ; TRI cells (Mather et al . , Annals 
N.Y. Acad. Sci. , 383:44-68 [1982]); MRC 5 cells; FS4 cells; and a human 
hepatoma line (Hep G2) . 

Host cells are transformed with the above -described expression or 
cloning vectors of this invention and cultured in conventional nutrient 

4 5 media modified as appropriate for inducing promoters, selecting 
transf ormants , or amplifying the genes encoding the desired sequences. 
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Infection with Agrobacterium tumefaciens is used for transformation 
of certain plant cells, as described by Shaw ec al., Gene . 23:315 (1983) 
and WO 89/05859 published 29 June 1989. For mammalian cells without such 
cell walls, the calcium phosphate precipitation method of Graham and van 
5 der Eb, Virology , 52:456-457 (1978) may be used. General aspects of 
mammalian cell host system transformations have been described by Axel in 
U.S. 4,3 99,216 issued 16 August 1983. Transformations into yeast are 
typically carried out according to the method of Van Solingen et al . , J. 
Bact^, 130:946 (1977) and Hsiao et al . , Proc . Natl. Acad. S ci . (USA) , 

10 76:3829 (1979) . However, other methods for introducing DNA into cells such 
as by nuclear injection or by protoplast fusion may also be used. 

In the preferred embodiment the DNA is introduced into the host cells 
using electroporation. See Andreason, J. Tiss. Cult. Meth. , 15:56-62 
(1993) , for a review of electroporation techniques useful for practicing 

15 the instantly claimed invention. It was discovered that electroporation 
techniques for introducing the DNA construct into the host cells were 
preferable over calcium phosphate precipitation techniques insofar as the 
latter could cause the DNA to break up and forming concantemers . 

The mammalian host cells used to express the product gene herein 

2 0 may be cultured in a variety of media as discussed in the definitions 
section above. The media contains the selection agent used for selecting 
transformed host cells which have taken up the DNA construct (either as an 
intra- or extra - chromosomal element) . To achieve selection of the 
transformed eukaryotic cells, the host cells may be grown in cell culture 

25 plates and individual colonies expressing the selectable gene (and thus the 
product gene) can be isolated and grown in growth medium until the 
nutrients are depleted. The host cells are then analyzed for transcription 
and/or transformation as discussed below. The culture conditions, such as 
temperature, pH, and the like, are those previously used with the host cell 

30 selected for expression, and will be apparent to the ordinarily skilled 
artisan . 

Gene amplification and/or expression may be measured in a sample 
directly, for example, by conventional Southern blotting, Northern blotting 
to quantitate the transcription of mRNA (Thomas, Proc. Natl. Acad. Sci. 

35 USA, 77:5201-5205 [1980]), dot blotting (DNA analysis), or in situ 
hybridization, using an appropriately labeled probe, based on the sequences 
provided herein. Various labels may be employed, most commonly 

radioisotopes, particularly "P. However, other techniques may also be 
employed, such as using biotin-modif ied nucleotides for introduction into 

4 0 a polynucleotide. The biotin then serves as the site for binding to avidin 
or antibodies, which may be labeled with a wide variety of labels, such as 
radionuclides, fluorescens, enzymes, or the like. Alternatively, 
antibodies may be employed that can recognize specific duplexes, including 
DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA-protein 

45 duplexes. The antibodies in turn may be labeled and the assay may be 
carried out where the duplex is bound to a surface, so that upon the 
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formation of duplex on the surface, the presence of antibody bound to the 
duplex can be detected. 

Gene expression, alternatively, may be measured by immunological 
methods, such as immunohistochemical staining of tissue sections and assay 
5 of cell culture or body fluids, to quantitate directly the expression of 
gene product. With immunohistochemical staining techniques, a cell sample 
is prepared, typically by dehydration and fixation, followed by reaction 
with labeled antibodies specific for the gene product coupled, where the 
labels are usually visually detectable, such as enzymatic labels, 

10 fluorescent labels, luminescent labels, and the like. A particularly 
sensitive staining technique suitable for use in the present invention is 
described by Hsu et al . , Am. J. Cl in. Path. , 75:734-738 (I960). 

In the preferred embodiment, the mRNA is analyzed by quantitative PCR 
(to determine the efficiency of splicing) and protein expression is 

15 measured using EL ISA as described in Example 1 herein. 

The product of interest preferably is recovered from the culture 
medium as a secreted polypeptide, although it also may be recovered from 
host cell lysates when directly expressed without a secretory signal . When 
the product gene is expressed in a recombinant cell other than one of human 

2 0 origin, the product of interest is completely free of proteins or 
polypeptides of human origin. However, it is necessary to purify the 
product of interest from recombinant cell proteins or polypeptides to 
obtain preparations that are substantially homogeneous as to the product 
of interest. As a first step, the culture medium or lysate is centrifuged 

25 to remove particulate cell debris. The product of interest thereafter is 
purified from contaminant soluble proteins and polypeptides, for example, 
by fractionation on immunoaf f inity or ion-exchange columns; ethanol 
precipitation; reverse phase HPLC; chromatography on silica or on a cation 
exchange resin such as DEAE; chromatof ocusing; SDS-PAGE; ammonium sulfate 

30 precipitation; gel electrophoresis using, for example, Sephadex G-75; 
chromatography on plasminogen columns to bind the product of interest and 
protein A Sepharose columns to remove contaminants such as IgG. 

The following examples are offered by way of illustration only and 
are not intended to limit the invention in any manner. All patent and 

35 literature references cited herein are expressly incorporated by reference. 

EXAMPLE 1 

tPA production using the dicistroni c expression vectors 
It was sought to increase the level of homogeneity with regard to 
expression levels of stable clones by expressing a selectable marker (such 
4 0 as DHFR) and the protein of interest from a single promoter. These vectors 
divert most of the transcript to product expression while linking it at a 
fixed ratio to DHFR expression via differential splicing. 

Vectors were constructed which were derived from the vector pRK (Suva 
et al . , Science, 237:893-896 [1987]) which contains an intron between the 
45 cytomegalovirus immediate early promoter (CMV) and the cDNA that encodes 
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the polypeptide of interest. The intron of pRK is 139 nucleotides in 
length, has a splice donor site derived from cytomegalovirus immediate 
early gene (CMVIE) , and a splice acceptor site from an IgG heavy chain 
variable region (V H ) gene (Eaton et al . , Biochem. . 25:8343 [1986]) . 
5 DHFR/intron vectors were constructed by inserting an EcoRV linker 

into the BSTX1 site present in the intron of pRK7 . An 830 base-pair 
fragment containing a mouse DHFR coding fragment was inserted to obtain 
DHFR intron expression vectors which differ only in the sequence that 
comprises the splice donor site. Those sequences were altered by 

10 overlapping PCR mutagenesis to obtain sequences that match splice donor 
sites found between exons 3 and 4 of normal and mutant Ras genes. PCR was 
also used to destroy the splice donor site. 

A mouse DHFR cDNA fragment (Simonsen et al . , Proc . Natl . Acad . Sci . 
USA , 60:2495-2499 [1983]) was inserted into the intron of this vector 59 

15 nucleotides downstream of the splice donor site. The splice donor site of 
this vector was altered by mutagenesis to change the ratio of spliced to 
non-spliced message in transfected cells. It has previously been shown 
that a single nucleotide change (G to A) converted a relatively efficient 
splice donor site found in the normal ras gene into an inefficient splice 

20 site (Cohen ec al . , Nature , 334:119-124 [1988]). This effect has been 
demonstrated in the context of the ras gene and confirmed when these 
sequences were transferred to human growth hormone constructs (Cohen et 
al., Cell , 58:461-472 [1989]). Additionally, a non functional 5' splice 
site (GT to CA) was constructed as a control UGT) . A polylinker was 

2 5 inserted 3 5 nucleotides downstream of the 3' splice site to accept the cDNA 
of interest. A vector containing t PA (Pennica et al . , Nature, 301:214-221 
[1983]) was linearized downstream of the polyadenylat ion site before it was 
introduced into CHO cells (Potter et al . , Proc. Natl. Acad. Sci. USA , 
81: 7161 [1984] ) . 

30 Plasmid DNA' s that contained DHFR/intron, tPA and (a) wild type ras 

(WT ras), i.e. Figure 3 (SEQ ID NO: 1), (b) mutant ras, or (c) non- 
functional splice donor site UGT) were introduced into CHO DHFR minus 
cells by electroporation . The intron vectors were each linearized 
downstream of the polyadenylat ion site by restriction endonuclease 

35 treatment. The control vector was linearized downstream of the second 
polyadenylation site. The DNA' s were ethanol precipitated after 
phenol /chloroform extraction and were resuspended in 20/il 1/10 Tris EDTA. 
Then, 10^g of DNA was incubated with 10 7 CHO.dpl2 cells (EP 307,247 
published 15 March 198 9) in 1 ml of PBS on ice for 10 min. before 

4 0 electroporation at 400 volts and 330fif using a BRL Cell Porator. 

Cells were returned to ice for 10 min. before being plated into non- 
selective medium. After 24 hours cells were fed nucleoside- free medium to 
select for stable DHFR+ clones which were pooled. The pooled DHFR+ clones 
were lysed and mRNA' s were prepared. 

45 To prepare the mRNA, RNA was extracted from 5 x 10 7 cells which were 

grown from pools of more than 2 00 clones derived from the stable 
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transfection of the three vectors, the essential construction of which is 
shown in Figure IB and from non-transf ected CHO cells. RNA was purified 
over oligo-DT cellulase {Collaborative Biomedical Products) . 10/xg of mRNA 
was then subjected to Northern blotting which involved running the mRNA on 
5 a 1.2% agarose, 6.6% formaldehyde gel, and transferring it to a nylon 
filter (Stratagene Duralon-UV membrane) , prehybridized, probed and washed 
according to the manufacturer ' s instructions . 

The filter was probed sequentially using probes (shown in Figure IB) 
that would detect (a) the full length message, (b) both full length and 

10 spliced message, or (c) beta actin. Probing with the long probe showed 
that the vector that contains the efficient splice donor site (i.e. WT ras) 
generates predominately a mRNA of the size predicted for the spliced 
product while the other two vectors gave rise primarily to a mRNA that 
corresponds in size to non-spliced message. The DHFR probe detected only 

15 full length message and demonstrated that the WT ras splice donor derived 
vector generates very little full length message with which to confer a 
DHFR positive phenotype. 

Figure 4 shows the number of DHFR positive colonies obtained after 
duplicate electroporations with the three intron vectors described above 

20 and from a conventional vector that has a CMV promoter driving tPA and a 
SV40 promoter driving DHFR (see Figure 2) . The increase in colony number 
parallels the increase in full length message that accumulates with the 
modification of the splice donor sites . The conventional vector 
efficiently generates colonies and does not vary significantly from the aGT 

25 construct. 

The level of tPA expression was determined by seeding cells in 1 ml 
of F12 : DMEM (50:50, with 5% FBS) in 24 well dishes to near confluency. 
Growth of the cells continued until the media was exhausted. Media was 
then assayed by ELISA for tPA production. Briefly, anti-tPA antibody was 
30 coated onto the wells of an ELISA microtiter plate, media samples were 
added to the wells followed by washing. Binding of the antigen <tPA) was 
then quantified using horse radish peroxidase (HRPO) labelled anti-tPA 
antibody. 

Figure 5A depicts the titers of secreted tPA protein after pooling 

3 5 the clones of each group shown in Figure 4 . While the number of colonies 

increased with a weakening of splice donor function, the inverse was seen 
with respect to tPA expression. The expression levels are consistent with 
the RNA products that are observed; as more of the dicistronic message is 
spliced an increased amount of message will contain tPA as the first open 
40 reading frame resulting in increased tPA expression. A mutation of GT to 
CA in the splice donor site results in an abundance of DHFR positive 
colonies which express undetectable levels of tPA, possibly resulting from 
inefficient utilization of the second AUG. Importantly, Figure 5A also 
shows that expression levels obtained from one of the dicistronic vectors 

4 5 (with WT ras SD) was about threefold higher than that obtained with the 

control vector containing a CMV promoter /enhancer driving tPA, SV4 0 
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promoter /enhancer controlling DHFR and SV40 polyadenylation signals 
controlling the expression of tPA and DHFR. 

Additionally, the homogeneity of expression in the pools was 
investigated. Figure 5B shows that all 20 clones generated by the WT ras 
5 splice donor site derived dicistronic vectors express detectable levels of 
tPA while only 4 of 2 0 clones generated by the control vector express tPA. 
None of the clones transfected with the non-splicing UGT) vector expressed 
tPA levels detectable by ELISA. This finding is consistent with previous 
observations that relatively few clones generated by conventional vectors 

10 make useful levels of protein. 

Expression of tPA was increased following methotrexate amplification 
of pools - Figure 5C shows that 2 of the dicistronic vector derived pools 
(i.e. with WT ras and MUTANT ras SD sites) increased in expression markedly 
(8.4 and 7.7 fold), while the pool generated by the conventional vector 

15 increased only slightly (2.B fold) when each was subjected to 200 nM Mtx. 
An overall increase of 9 fold was obtained using the best dicistronic (WT 
ras SD) versus the conventional vector following amplification. Growth of 
the highest expressing amplified pool in nutrient rich production medium 
yielded titers of 4.2 /ig/ml tPA. 

20 It was shown that manipulation of the splice donor sequence alters 

the ratio of spliced to full length message and the number of colonies that 
form in selective medium. It was also shown that dicistronic expression 
vectors generate clones that express high levels of recombinant proteins . 
Surprisingly, it was possible to isolate high expressors which had the 

25 efficient WT ras splice donor site by selection for DHFR* cells despite the 
efficiency with which the DHFR gene was spliced from the RNA precursors 
formed in these cells. 

EXAMPLE 2 

TNFr-IcrG production using the dicistronic expression vectors 
30 To prove the general applicability of this approach, a second product 

was evaluated in the dicistronic vector system containing, as the DNA of 
interest, an immunoadhesin (TNFr-IgG) capable of binding tumor necrosis 
factor (TNF) (Ashxenazi et aJ . , Proc . Natl. Acad. Sci. USA , 88:10535-10539 
[1991] ) . The experiments described in Example 1 above were essentially 
35 repeated except that the product gene encoded the immunoadhesin TNFr-IgG. 
Plasmid DNA' s that contained a TNFr-IgG cDNA and (a) WT ras, i.e. Figure 
6 (SEQ ID NO: 2) , (b) mutant ras or (c) nonfunctional splice donor site 
(AGT) were introduced into the dpl2 . CHO cells as discussed for Example 1. 
See Figure 1C for an illustration of the DNA constructs. 
40 It was discovered that the number of DHFR positive colonies generated 

by three of these vectors was similar to that seen with the tPA constructs . 
Expression of TNFr-IgG also paralleled that seen with the tPA constructs 
(Figure 7A) . Amplification of pools from two of the constructs showed a 
marked increase in expression of immunoadhesin (9.6 and 6.8 fold) (Figure 
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7B) - The best of these amplified pools expressed 9 . 5 fig/ml when grown in 
nutrient rich production medium. 

Thus, it was again shown that dicistronic expression vectors generate 
clones that express high levels of recombinant proteins. Furthermore, 
5 contrary to expectations, it was discovered that isolation of high product 
expressing host DHFR* cells was possible using an efficient splice donor 
site (i.e. the WT ras splice donor site) . 

EXAMPLE 3 

Antibody production using a dicistronic expression vector 

10 The usefulness of this system for antibody expression was evaluated 

by testing production of an antibody directed against IgE (Presta et al . , 
Journal of Immunology . 151:2623-2632 [1993]). Further, the flexibility of 
the system with regard to transcription initiation was tested by replacing 
the CMV promoter/enhancer present in the previous vectors with the 

15 promoter/ enhancer derived from the early region of SV40 virus {Griffin, 
B. f Structure and Genomic Organization of SV40 and Polyoma Virus, In J. 
Tooze [Ed] DNA Tumor Viruses, Cold Spring Harbor Laboratory, Cold Spring 
Harbor, New York) . The heavy chain of the antibody was inserted downstream 
of DHFR as described in the earlier tPA and TNFr-IgG constructs. 

20 Additionally, a new splice donor site sequence ( GAC : GTAAGT ) was engineered 
into the vector which matches the consensus splice donor site more closely 
than did the splice donor sites present in the vectors tested in Examples 
1 and 2. The resultant expression vector is shown in Figures ID and 9. 

It was discovered that this vector produced fewer colonies than the 

25 vectors previously tested, and produced predominantly a spliced RNA 
product. A second vector was constructed to have the light chain of the 
antibody under control of the SV4 0 promoter/enhancer and poly-A and the 
hygromycin B resistance gene under control of the CMV promoter /enhancer and 
SV40 poly-A. These vectors were linearized at unique Hpal sites downstream 

30 of the poly-A signal, mixed at a ratio of light chain vector to heavy chain 
vector of 10:3 and electroporated into CHO cells using an optimized 
protocol (as discussed in Examples 1 and 2) . 

Figure 11 shows the levels of antibody expressed by clones and pools 
after selection in hygromycin B followed by selection for DHFR expression. 

35 All 2 0 of the clones analyzed expressed high levels of antibody when grown 
in rich medium and varied from one another by only a factor of four. A 
pool of antibody producing clones was generated and assayed shortly after 
it was established. That pool was grown continuously for 6 weeks without 
a significant decrease in productivity demonstrating that its stability was 

40 sufficient to generate gram quantities of protein from its large scale 
culture. 

The pool was subjected to methotrexate amplification at 200nM and 1/iM 
and achieved a greater than 2 fold increase in antibody titer. The l/iM Mtx 
resistant pool achieved a titer of 41 mg/L when grown under optimal 
45 conditions in suspension culture. 
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The structure of the expressed antibody was examined. Proteins 
expressed by the 2 00nM methotrexate resistant pool and by a well 
characterized expression clone generated by conventional vectors (Presta 
et al. [1993], supra) were metabolically labeled with S" cysteine and 
5 methionine. In particular, confluent 35mm plates of cells were 

metabolically labeled with 50^Ci each S-35 methionine and S-35 cysteine 
(Amersham) in serum free cysteine and methionine free F12 : DMEM . After one 
hour, nutrient rich production media was added and labeled proteins were 
allowed to "chase" into the medium for six more hours. Proteins were run 

10 on a 12% SDS/PAGE gel (NOVEX) non- reduced or following reduction with B- 
mercaptoethanol . Dried gels were exposed to film for 16 hours. CHO 
control cells were also labeled. 

The majority of the antibody protein is secreted with a molecular 
weight of about 155 kilodaltons, consistent with a properly disulfide- 

15 linked antibody molecule with 2 light and 2 heavy chains. Upon reduction 
the molecular weight shifts to 2 approximately equally abundant proteins 
of 22.5 and 55 kilodaltons. The protein generated from the pool is 
indistinguishable from the antibody produced by the well characterized 
expression clone, with no apparent increase of free heavy or light chain 

20 expressed by the pool. 

CONCLUSION 

The efficient expression system described herein utilizes vectors 
consisting of promoter/ enhancer elements followed by an intron containing 
the selectable marker coding sequence, followed by the cDNA of interest and 

25 a polyadenylat ion signal . 

Several splice donor site sequences were tested for their effect on 
colony number and expression of the cDNA of interest. A non- functional 
splice donor site, splice donor sites found in an intron between exons 3 
and 4 of mutant (mutant ras) and normal (WT ras) forms of the Harvey Ras 

30 gene and another efficient SD site (see Example 3) were used. The vectors 
were designed to direct expression of dicistronic primary transcripts. 
Within a trans fected cell some of the transcripts remain full length while 
the remainder are spliced to excise the DHFR coding sequence. When the 
splice donor site is weakened or destroyed an increase in colony number 

3 5 is observed. 

Expression levels show the inverse pattern, with the most efficient 
splice donor sites generating the highest levels of tPA, TNFr immunoadhesin 
or anti-IgE V H . 

The homogeneity of expression of clones generated by the ras splice 

4 0 donor site intron DHFR vectors was compared to clones generated from a 

conventional vector with a separate promoter /enhancer and polyadenylation 
signal for each DHFR and tPA. The DHFR intron vector gives rise to 
colonies that are much more homogeneous with regard to expression than 
those generated by the conventional vector. Non- expressing clones derived 
4 5 from the conventional vector may be the result of breaks in the tPA or 
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TNFr-IgG domain of the plasmid during integration into the genome or the 
result of methylation of promoter elements (Busslinger et al„, Cell, 
34:197-206 [1983]; Watt et al . , Genes an d Development, 2:1136-1143 [1988]) 
driving tPA or TNFr-IgG expression. Promoter silencing by methylation or 
5 breaks in the DHFR-intron vectors would very likely render them incapable 
of conferring a DHFR positive phenotype. 

It was found that pools generated by the DHFR-intron vectors could 
be amplified in methotrexate and would increase in expression by a factor 
of 8.4 (tPA) , or 9.8 (TNFr-IgG). Pools from conventional vectors increased 

10 by only 2.8 and 3.0 fold for tPA and TNFr-IgG when amplified similarly. 
Amplified pools resulted in 9 fold higher tPA levels and 15 fold higher 
TNFr-IgG levels when compared to the conventional vector amplified pools. 

Without being limited to any theory, the increase in expression of 
methotrexate resistant pools derived from the dicistronic vectors is likely 

15 due to the transcriptional linkage of DHFR and the product,- when cells are 
selected for increased DHFR expression they consistently over- express 
product. Conventional approaches lack selectable marker and cDNA 

expression linkage and therefore methotrexate amplification often generates 
DHFR overexpression without the concomitant increase in product expression. 

20 A further increase of 4 and 6.3 fold in expression were obtained when 

amplified tPA and TNFr-IgG pools were transferred from the media used for 
the selections and amplifications to a nutrient rich production medium. 

In Example 3, the expression vector had a splice donor site that more 
closely matches the consensus splice donor sequence and had the heavy chain 

25 of a humanized anti-IgE antibody inserted downstream. This vector was 
linearized and co-electroporated with a second linearized vector that 
expresses the hygromycin resistance gene and the light chain of the 
antibody each under the control of its own promoter /enhancer and poly -A 
signals. An excess of light chain expression vector over the heavy chain 

3 0 dicistronic expression vector was used to bias in favor of light chain 

expression. Clones and a pool were generated after hygromycin B and DHFR 
selections. The clones were found to express relatively consistent, high 
levels of antibody, as did the pool. The lj-iM pool achieved a titer of 
4lmg/L when grown under optimal conditions in suspension culture. 

35 The anti-IgE antibody was assessed by metabolic labeling followed by 

SDS/PAGE under reducing and non reducing conditions and found to be 
indistinguishable from the protein expressed by a highly characterized 
clonal cell line. Of particular importance is the finding that no free 
light chain is observed in the pool relative to the clone. 

40 A stable expression system for CHO cells has been developed that 

produces high levels of recombinant proteins rapidly and with less effort 
than that required by other expression systems. The vector system 
generates stable clones that express consistently high levels thereby 
reducing the number of clones that must be screened to obtain a highly 

4 5 productive clonal line. Alternatively, pools have been used to 

conveniently generate moderate to high levels of protein. This approach 
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may be particularly useful when a number of related proteins are to be 
expressed and compared. 

Without being limited to this theory, it is possible the vectors that 
have very efficient splice donor sites generate very productive clones 
5 because so little transcript remains non spliced that only integration 
events that lead to the generation of high levels of RNA produce enough 
DHFR protein to give rise to colonies in selective medium. The high level 
of spliced message from such clones is then translated into abundant 
amounts of the protein of interest. Pools of clones made concurrently by 
10 introducing conventional vectors expressed lower levels of protein, and 
were unstable with regard to long term expression, and expression could not 
be appreciably increased when the cells were subjected to methotrexate 
amplification. 

The system developed herein is versatile in that it allows high 
15 levels of single and multiple subunit polypeptides to be rapidly generated 
from clones or pools of stable transf ectants . This expression system 
combines the advantages of transient expression systems {rapid and labor 
non intensive generation of research amounts of protein) with the 
concurrent development of highly productive stable production cell lines. 
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(A) LENGTH: 736 0 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
<D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



5 5 TTCGAGCTCG CCCGACATTG ATTATTGACT AGTTATTAAT AGTAATCAAT 5 0 



TACGGGGTCA TTAGTTCATA GCCCATATAT GGAGTTCCGC GTTACATAAC 100 

60 

TTACGGTAAA TGGCCCGCCT GGCTGACCGC CCAACGACCC CCGCCCATTG 150 



ACGTCAATAA TGACGTATGT TCCCATAGTA ACGC CAATAG GGACTTTCCA 200 

65 

TTGACGTCAA TGGGTGGAGT ATTTACGGTA AACTGCCCAC TTGG CAGTAC 2 50 



10 



15 
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20 



45 



50 
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ATCAAGTGTA TCATATGCCA AGTACGCCCC CTATTGACGT CAATGACGGT 300 
AAATGGCCCG CCTGGCATTA TGCCCAGTAC ATGACCTTAT GGGACTTTCC 350 

5 

TACTTGGCAG TACATCTACG TATTAGTCAT CGCTATTACC ATGGTGATGC 400 
10 GGTTTTGGCA GTACATCAAT GGGCGTGGAT AGCGGTTTGA CTCACGGGGA 45 0 
TTTCCAAGTC TCCACCCCAT TGACGTCAAT GGGAGTTTGT TTTGGCACCA 500 
AAATCAACGG GACTTTCCAA AATGTCGTAA CAACTCCGCC CCATTGACGC 550 
AAATGGGCGG TAGGCGTGTA CGGTGGGAGG TCTATATAAG CAGAGCTCGT 600 
TTAGTGAACC GTCAGATCGC CTGGAGACGC CATCCACGCT GTTTTGACCT 650 
25 CCATAGAAGA CACCGGGACC GATCCAGCCT CCGCGGCCGG GAACGGTGCA 700 
TTGGAACGCG GATTC CCCGT GCCAAGAGTG CTGTAAGTAC CGCCTATAGA 750 
GCGATAAGAG GATTTTATCC CCGCTGCCAT CATGGTTCGA CCATTGAACT BOO 
GCATCGTCGC CGTGTCCCAA AATATGGGGA TTGGCAAGAA CGGAGACCTA 650 
CCCTGCCCTC CGCTCAGGAA CGCGTTCAAG TACTTCCAAA GAATGACCAC 900 
4 0 AACCTCTTCA GTGGAAGGTA AACAGAATCT GGTGATTATG GGTAGGAAAA 950 
CCTGGTTCTC CATTCCTGAG AAGAATCGAC CTTTAAAGGA CAGAATTAAT 1000 
ATAGTTCTCA GTAGAGAACT CAAAGAACCA CCACGAGGAG CTCATTTTCT 1050 
TGCCAAAAGT TTGGATGATG CCTTAAGACT TATTGAACAA CCGGAATTGG 1100 
CAAGTAAAGT AGACATGGTT TGGATAGTCG GAGGCAGTTC TGTTTACCAG 1150 
55 GAAGCCATGA ATCAACCAGG CCACCTTAGA CTCTTTGTGA CAAGGATCAT 1200 
GCAGGAATTT GAAAG TGACA CGTTTTTCCC AGAAATTGAT TTGGGGAAAT 1250 
ATAAACCTCT CCCAGAATAC CCAGGCGTCC TCTCTGAGGT CCAGGAGGAA 1300 
AAAGGCATCA AGTATAAGTT TGAAGTCTAC GAGAAGAAAG ACTAACAGGA 1350 
AGATGCTTTC AAGTTCTCTG CTCCCCTCCT AAAGCTATGC ATTTTTATAA 1400 
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GACCATGGGA CT T T TG CTGG CTTTAGACCC CCTTGGCTTC GTTAGAACGC 1450 
GGCTACAATT AATACATAAC CTTATGTATC ATACACATAG ATTTAGGTGA 1500 

5 

CACTATAGAA T AA CAT C CAC TTTGCCTTTC TCTCCACAGG TGTCACTCCA 1550 
10 GGTCAACTGC ACCTCGGTTC TAAGCTTGGG CTGCAGGTCG CCGTGAATTT 1600 
AAGGGACGCT GTGAAGCAAT CATGGATGCA ATGAAGAGAG GGCTCTGCTG 1650 

15 

TGTGCTGCTG CTGTGTGGAG CAGTCTTCGT TTCGCCCAGC CAGGAAATCC 1700 
ATG CCCGATT CAGAAGAGGA GCCAGATCTT ACCAAGTGAT CTG CAGAGAT 1750 

20 

GAAAAAACGC AGATGATATA CCAGCAACAT CAGTCATGGC TGCGCCCTGT 1800 
25 GCTCAGAAGC AACCGGGTGG AATATTGCTG GTGCAACAGT GGCAGGGCAC 1850 
AGTGCCACTC AGTGCCTGTC AAAAGTTGCA GCGAGCCAAG GTGTTTCAAC 1900 

30 

GGGGGCACCT GCCAGCAGGC CCTGTACTTC TCAGATTTCG TGTGCCAGTG 1950 
CCCCGAAGGA TTTGCTGGGA AGTGCTGTGA AATAGATACC AGGGCCACGT 2000 

35 

GCTACGAGGA CCAGGGCATC AGCTACAGGG GCACGTGGAG CACAGCGGAG 2050 
40 AGTGGCGCCG AGTGCACCAA CTGGAACAGC AGCGCGTTGG CCCAGAAGCC 2100 

CTACAG CGGG CGGAGGCCAG ACGCCATCAG GCTGGGCCTG GGGAACCACA 2150 

45 

ACTACTGCAG AAACC CAGAT CGAGACTCAA AGCCCTGGTG CTACGTCTTT 2200 

AAGGCGGGGA AGTACAGCTC AGAGTTCTGC AGCACCCCTG CCTGCTCTGA 2250 

50 

GGGAAACAGT GACTGCTACT TTGGGAATGG GTCAGCCTAC CGTGGCACGC 2300 
55 ACAGCCTCAC CGAGTCGGGT GCCTCCTGCC TCCCGTGGAA TTC CATGATC 2350 
CTGATAGGCA AGGTTTACAC AG CACAG AAC CCCAGTGCCC AGG CACTGGG 2400 

60 

CCTGGGCAAA CATAATTACT GCCGGAATCC TGATGGGGAT GCCAAGCCCT 2450 
GGTGCCACGT GCTGAAGAAC CGCAGGCTGA CGTGGGAGTA CTGTGATGTG 250 0 

65 

CCCTCCTGCT CCACCTGCGG CCTGAGACAG TACAG CCAGC CTCAGTTTCG 2550 
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CATCAAAGGA GGGCTCTTCG CCGACATCGC CTCCCACCCC TGGCAGGCTG 2600 
CCATCTTTGC CAAGCACAGG AGGTCGCCCG GAGAGCGGTT CCTGTGCGGG 2650 
GGCATACTCA TCAG CTCCTG CTGGATTCTC TCTGCCGCCC ACTGCTTCCA 2700 
GGAGAGGTTT CCGCCCCACC ACCTGACGGT GATCTTGGGC AGAACATACC 2750 
GGGTGGTCCC TGGCGAGGAG GAGCAGAAAT TTGAAGTCGA AAAATACATT 2800 
GTCCATAAGG AATTCGATGA TGACACTT A C GACAATGACA TTGCGCTGCT 2850 
GCAGCTGAAA TCGGATTCGT CCCGCTGTGC CCAGGAGAGC AGCGTGGTCC 2900 
GCACTGTGTG CCTTCCCCCG GCGGACCTGC AG CTG CCGGA CTGGACGGAG 2950 
TGTGAGCTCT CCGGCTACGG CAAGCATGAG GCCTTGTCTC CTTTCTATTC 3000 
GGAGCGGCTG AAGGAGGCTC ATGTCAGACT GTACCCATCC AGCCGCTGCA 3050 
CATCACAACA TTTACTTAAC AGAACAGTCA CCGACAACAT GCTGTGTGCT 3100 
GGAGACACTC GGAGCGGCGG GCCCCAGGCA AACTTGCACG ACGCCTGCCA 3150 
GGGCGATTCG GGAGGCCCCC TGGTGTGTCT GAACGATGGC CGCATGACTT 3200 
TGGTGGGCAT CATCAGCTGG GGCCTGGGCT GTGGACAGAA GGATGTCCCG 32 50 
GGTGTGTACA CCAAGG TTAC CAACTACCTA GACTGGATTC GTGACAACAT 33 00 
GCGACCGTGA CCAGGAACAC CCGACTCCTC AAAAGCAAAT GAGATCCCGC 3350 
CTCTTCTTCT TCAGAAGACA CTGCAAAGGC GCAGTGCTTC TCTACAGACT 34 00 
TCTCCAGACC CACCACACCG CAGAAGCGGG ACGAGACCCT ACAGGAGAGG 3450 
GAAGAGTGCA TTTTCCCAGA TACTTCCCAT TTTGGAAGTT TTCAGGACTT 3500 
GGTCTGATTT CAGGATACTC TGTCAGATGG GAAGACATGA ATGCACACTA 3550 
GCCTCTCCAG GAATGCCTCC TCCCTGGGCA GAAGTGGGGG GAATTCAATC 3600 
GATGGCCGCC ATGGCCCAAC TTGTTTATTG C AG CTTATAA TGG TTACAAA 3650 
TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT TTTCACTGCA 3700 
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TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 3750 
TCGATCGGGA ATTAATTCGG CGCAGCACCA TGGC CTGAAA TAACCTCTGA 3800 

5 

AAGAGGAACT TGGTTAGGTA CCTTCTGAGG CGGAAAGAAC CAGCTGTGGA 3850 
1 0 ATGTGTGTCA GTTAGGGTGT GGAAAGTCCC CAGGCTCCCC AGCAGG CAG A 3900 
AGTATGCAAA GCATGCATCT CAATTAGTCA GCAACCAGGT GTGGAAAGTC 3 950 
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45 
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65 



CCCAGGCTCC CCAGCAGGCA GAAGTATGCA AAGCATG CAT CTCAATTAGT 4000 



CAGCAACCAT AGTCCCGCCC CTAACTCCGC CCATCCCGCC CCTAACTCCG 4050 
CCCAGTTCCG CCCATTCTCC GCCCCATGGC TGACTAATTT TTTTTATTTA 4100 
25 TGCAGAGGCC GAGGCCGCCT CGGCCTCTGA GCTATTCCAG AAGTAGTGAG 4150 
GAGGCTTTTT TGGAGGCCTA GGCTTTTGCA AAAAG CTGTT AACAGCTTGG 4200 
CACTGGCCGT CGTTTTACAA CGTCGTGACT GGGAAAACCC TGGCGTTACC 4250 
CAACTTAATC GCCTTGCAGC ACATCCCCCC TTCGCCAGCT GGCGTAATAG 4300 



CGAAGAGGCC CGCACCGATC GCCCTTCCCA ACAGTTGCGT AGCCTGAATG 4350 



4 0 GCGAATGGCG CCTGATGCGG TATTTTCTCC TTACGCATCT GTGCGGTATT 44 00 



TCACACCGCA TACGTCAAAG CAACCATAGT ACGCGCCCTG TAGCGGCGCA 4450 
TTAAGCGCGG CGGGTGTGGT GGTTACGCGC AGCGTGACCG CTACACTTGC 4 500 
CAGCGCCCTA GCGCCCGCTC CTTTCGCTTT CTTCCCTTCC TTTCTCGCCA 4550 
CGTTCGC CGG CTTTCCCCGT CAAGCTCTAA ATCGGGGGCT CCCTTTAGGG 4600 
55 TTCCGATTTA GTGCTTTACG GCACCTCGAC CCCAAAAAAC TTGATTTGGG 4650 
TGATGGTTCA CGTAGTGGGC CATCGCCCTG ATAGACGGTT TTTCGCCCTT 4 700 
TGACGTTGGA GTCCACGTTC TTTAATAGTG GACTCTTGTT CCAAACTGGA 4 750 
ACAACACTCA ACCCTATCTC GGGCTATTCT TTTGATTTAT AAGGGATTTT 4 800 
GCCGATTTCG GCCTATTGGT TAAAAAATGA GCTGATTTAA CAAAAATTTA 4850 
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ACGCGAATTT TAACAAAATA TTAACGTTTA CAATTTTATG GTGCACTCTC 4 900 
AGTACAATCT GCTCTGATGC CGCATAGTTA AGCCAACTCC GCTATCGCTA 4 950 
CGTGACTGGG TCATGGCTGC GCCCCGACAC CCGCCAACAC CCGCTGACGC 5000 
GCCCTGACGG GCTTGTCTGC TCCCGGCATC CGCTTACAGA CAAGCTGTGA 5050 
CCGTCTCCGG GAGCTGCATG TGTCAGAGGT TTTCACCGTC ATCACCGAAA 5100 
CGCGCGAGGC AGTATTCTTG AAGACGAAAG GGCCTCGTGA TACGCCTATT 5150 
TTTATAGGTT AATGTCATGA TAATAATGGT TTCTTAGACG TCAGGTGGCA 5200 
CTTTTCGGGG AAATGTGCGC GGAACCCCTA TTTGTTTATT TTTCTAAATA 5250 
CATTCAAATA TGTATCCGCT CATGAGACAA TAACCCTGAT AAATGCTTCA 5300 
ATAATATTGA AAAAGGAAGA GTATGAGTAT TCAACATTTC CGTGTCGCCC 5350 
TTATTCCCTT TTTTGCGGCA TTTTGCCTTC CTGTTTTTGC TCACCCAGAA 5400 
ACGCTGGTGA AAGTAAAAGA TGCTGAAGAT CAGTTGGGTG CACGAGTGGG 5450 
TTACATCGAA CTGGATCTCA ACAGCGGTAA GATCCTTGAG AGTTTTCGCC 5500 
CCGAAGAACG TTTTCCAATG ATGAGCACTT TTAAAGTTCT GCTATGTGGC 5550 
GCGGTATTAT CCCGTGATGA CGCCGGGCAA GAGCAACTCG GTCGCCGCAT 5600 
ACACTATTCT CAGAATGACT TGGTTGAGTA CTCACCAGTC ACAGAAAAGC 5650 
ATCTTACGGA TGGCATGACA GTAAGAGAAT TATGCAGTGC TGCCATAACC 5700 
ATGAGTGATA ACACTGCGGC CAACTTACTT CTGACAACGA TCGGAGGACC 5750 
GAAGGAGCTA ACCGCTTTTT TGCACAACAT GGGGGATCAT GTAACTCGCC 5800 
TTGATCGTTG GGAACCGGAG CTGAATGAAG CCATACCAAA CGACGAGCGT 5850 
GACACCACGA TGCCAGCAGC AATGG CAACA ACGTTGCGCA AACTATTAAC 5900 
TGGCGAACTA CTTACTCTAG CTTCCCGGCA ACAATTAATA GACTGGATGG 5950 
AGGCGGATAA AGTTGCAGGA CCACTTCTGC GCTCGGCCCT TCCGGCTGGC 6000 
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TGGTTTATTG CTGATAAATC TGGAGCCGGT GAGCGTGGGT CTCGCGGTAT 6050 
CATTGCAGCA CTGGGGCCAG ATGGTAAGCC CTCCCGTATC GTAGTTATCT 6100 
ACACGACGGG GAGTCAGGCA ACTATGGATG AACGAAATAG ACAGATCGCT 6150 
GAGATAGGTG CCTCACTGAT TAAGCATTGG TAACTGTCAG ACCAAGTTTA 6200 
CTCATATATA CTTTAGATTG ATTTAAAACT TCATTTTTAA TTTAAAAGGA 6250 
TCTAGGTGAA GATCCTTTTT GATAATCTCA TGACCAAAAT CCCTTAACGT 6 300 
GAGTTTTCGT TCCACTGAGC GTCAGACCCC GTAGAAAAGA TCAAAGGATC 63 50 
TTCTTGAGAT CC TTTTTTT C TGCGCGTAAT CTGCTGCTTG CAAACAAAAA 6400 
AACCACCGCT ACCAGCGGTG GTTTGTTTGC CGGATCAAGA GCTACCAACT 6450 
CTTTTTCCGA AGGTAACTGG CTTCAGCAGA GCGCAGATAC CAAATACTGT 6500 
CCTTCTAGTG TAGCCGTAGT TAGGCCACCA CTTCAAGAAC TCTGTAGCAC 6 550 
CGC CTACATA CCTCGCTCTG CTAATCCTGT TACCAGTGGC TGCTGCCAGT 6600 
GGCGATAAGT CGTGTCTTAC CGGGTTGGAC TCAAGACGAT AGTTACCGGA 6650 
TAAGGCGCAG CGGTCGGGCT GAACGGGGGG TTCGTGCACA CAGCCCAGCT 6700 
TGGAGCGAAC GACCTACACC GAACTGAGAT ACCTACAGCG TGAGCATTGA 6750 
GAAAGCGCCA CGCTTCCCGA AGGGAGAAAG GCGGACAGGT ATCCGGTAAG 6800 
CGGCAGGGTC GGAACAGGAG AGCGCACGAG GGAGCTTCCA GGGGGAAACG 6850 
CCTGGTATCT TTATAGTCCT GTCGGGTTTC GCCACCTCTG ACTTGAGCGT 6900 
CGATTTTTGT GATGCTCGTC AGGGGGGCGG AGCCTATGGA AAAACGCCAG 6950 
CAACGCGGCC TTTTTACGGT TCCTGGCCTT TTGCTGGCCT TTTGCTCACA 7 000 
TGTTCTTTCC TGCGTTATCC CCTGATTCTG TGGATAACCG TATTACCGCC 7050 
TTTGAGTGAG CTGATACCGC TCGCCGCAGC CGAACGACCG AGCGCAGCGA 7100 
GTCAGTGAGC GAGGAAGCGG AAGAGCGCCC AATACGCAAA CCGCCTCTCC 7150 
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CCGCGCGTTG GCCGATTCAT TAATCCAGCT GGCACGACAG GTTTCCCGAC 7200 
TGGAAAGCGG GCAGTGAGCG CAACGCAATT AATGTGAGTT ACCTCACTCA 7250 
TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT ATGTTGTGTG 73 00 
GAATTGTGAG CGGATAACAA TTTCACACAG GAAACAGCTA TGACCATGAT 7350 
TACGAATTAA 7360 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 688 9 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
TTCGAGCTCG CCCGACATTG ATTATTGACT AGTTATTAAT AGTAATCAAT 50 
TACGGGGTCA TTAGTTCATA GCCCATATAT GGAGTTCCGC GTTACATAAC 100 
TTACGGTAAA TGGCCCGCCT GGCTGACCGC CCAACGACCC CCGCCCATTG 150 
ACGTCAATAA TGACGTATGT TCCCATAGTA ACGCCAATAG GGACTTTCCA 200 
TTGACGTCAA TGGGTGGAGT ATTTACGGTA AACTGCCCAC TTGGCAGTAC 250 
ATCAAGTGTA TCATATGCCA AGTACGCCCC CTATTGACGT CAATGACGGT 3 00 
AAATGGCCCG CCTGGCATTA TGCCCAGTAC ATGACCTTAT GGGACTTTCC 3 50 
TACTTGGCAG TACATCTACG TATTAGTCAT CGCTATTACC ATGGTGATGC 400 
GGTTTTGGCA GTACATCAAT GGGCGTGGAT AG CGGTTTGA CTCACGGGGA 450 
TTTCCAAGTC TCCACCCCAT TGACGTCAAT GGGAGTTTGT TTTGGCACCA 500 
AAATCAACGG GACTTTCCAA AATGTCGTAA CAACTCCGCC CCATTGACGC 550 
AAATGGGCGG TAGGCGTGTA CGGTGGGAGG TCTATATAAG CAGAGCTCGT 600 
TTAGTGAACC GTCAGATCGC CTGGAGACGC CATCCACGCT GTTTTGACCT 650 
CCATAGAAGA CACCGGGACC GATCCAGCCT CCGCGGCCGG GAACGGTGCA 700 
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TTGGAACGCG GATTCCCCGT GC CAAGAGTG CTGTAAGTAC CGCCTATAGA 750 
GCGATAAGAG GATTTTATCC CCGCTGCCAT CATGGTTCGA CCATTGAACT 800 

5 

GCATCGTCGC CGTGTCCCAA AATATGGGGA TTGGCAAGAA CGGAGACCTA 850 
10 CCCTGCCCTC CGCTCAGGAA CGCGTTCAAG TACTTCCAAA GAATGACCAC 900 
AAC CTCTTCA GTGGAAGGTA AACAGAATCT GGTGATTATG GGTAGGAAAA 950 

15 

CCTGGTTCTC CATTCCTGAG AAGAATCGAC CTTTAAAGGA CAGAATTAAT 1000 
ATAGTTCTCA GTAGAGAACT CAAAGAACCA CCACGAGGAG CTCATTTTCT 1050 

20 

TGCCAAAAGT TTGGATGATG CCTTAAGACT TATTGAACAA CCGGAATTGG 1100 
25 CAAGTAAAGT AGACATGGTT TGGATAGTCG GAGGCAGTTC TGTTTACCAG 1150 
GAAG CCATGA ATCAACCAGG CCACCTTAGA CTCTTTGTGA CAAGGATCAT 12 00 

30 

GCAGGAATTT GAAAGTGACA CGTTTTTCCC AGAAATTGAT TTGGGGAAAT 1250 
ATAAACCTCT CCCAGAATAC CCAGGCGTCC TCTCTGAGGT CCAGGAGGAA 13 00 

35 

AAAGGCATCA AGTATAAGTT TGAAGTCTAC GAGAAGAAAG ACTAACAGGA 1350 
40 AGATGCTTTC AAGTTCTCTG CTCCCCTCCT AAAGCTATGC ATTTTTATAA 1400 
GAC CATGGGA CTTTTGCTGG CTTTAGACCC CCTTGG CTTC GTTAGAACGC 1450 

45 

GGCTACAATT AATACATAAC CTTATGTATC ATACACATAG ATTTAGGTGA 1500 
C AC T AT AG AA TAACATCCAC TTTGCCTTTC TCTCCACAGG TGTCACTCCA 1550 

50 

GGTCAACTGC ACCTCGGTTC TATCGATTGA ATTCCCCGGC CATAGCTGTC 16 00 
55 TGGCATGGGC CTCTCCACCG TGCCTGACCT GCTGCTGCCG CTGGTGCTCC 1650 
TGGAGCTGTT GGTGGGAATA TACCCCTCAG GGGTTATTGG ACTGGTCCCT 1700 

60 

CAC CTAGGGG ACAGGGAGAA GAGAGATAGT GTGTGTCCCC AAGGAAAATA 1750 
TATCCACCCT CAAAATAATT CGATTTGCTG TACCAAGTGC CACAAAGGAA 1800 

65 

CCTACTTGTA CAATGACTGT CCAGGCCCGG GGCAGGATAC GGACTGCAGG 1850 
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GAGTGTGAGA GCGGCTCCTT CACCGCTTCA GAAAACCACC TCAGACACTG 1900 
CCTCAGCTGC TCCAAATGCC GAAAGGAAAT GGGTCAGGTG GAGATCTCTT 1950 
CTTGCACAGT GGACCGGGAC ACCGTGTGTG GCTGCAGGAA GAACCAGTAC 2000 
CGGCATTATT GGAGTGAAAA CCTTTTCCAG TGCTTCAATT GCAGCCTCTG 2050 
CCTCAATGGG ACCGTGCACC TCTCCTGCCA GGAGAAACAG AACACCGTGT 2100 
GCACCTGCCA TGCAGGTTTC TTTCTAAGAG AAAACGAGTG TGTCTCCTGT 2150 
AGTAACTGTA AGAAAAGCCT GGAGTGCACG AAGTTGTGCC TACCCCAGAT 2200 
TGAGAATGTT AAGGGCACTG AGGACTCAGG CACCACAGAC AAGAGAGTTG 2250 
AG CTCAAAAC CCCACTTGGT GACACAACTC ACACATGCCC ACGGTGCCCA 23 00 
GAGCCCAAAT CTTGTGACAC ACCTCCCCCG TGCCCACGGT GCCCAGAGCC 2350 
CAAATCTTGT GACACACCTC CCCCATGCCC ACGGTGCCCA GAGCCCAAAT 2400 
CTTGTGACAC ACCTCCCCCA TGCCCACGGT GCCCAGCACC TGAACTCCTG 2450 
GGAGGACCGT CAGTCTTCCT CTTCCCCCCA AAACCCAAGG ATACCCTTAT 2500 
GATTTCCCGG ACCCCTGAGG TCACGTGCGT GGTGGTGGAC GTGAGCCACG 2550 
AAGACCCCGA GGTCCAGTTC AAGTGGTACG TGGACGGCGT GGAGGTGCAT 2600 
AATGCCAAGA CAAAGCCGCG GGAGGAGCAG TTCAACAGCA CGTTCCGTGT 2650 
GGTCAGCGTC CTCACCGTCC TGCACCAGGA CTGGCTGAAC GGCAAGGAGT 2700 
ACAAGTG CAA GGTCTCCAAC AAAGCCCTCC CAGCCCCCAT CGAGAAAACC 2750 
ATCTCCAAAA CCAAAGGACA GCCCCGAGAA CCACAGGTGT ACACCCTGCC 2800 
CCCATCCCGG GAGGAGATGA CCAAGAACCA GGTCAGCCTG ACCTGCCTGG 2850 
TCAAAGGCTT CTACCCCAGC GACATCGCCG TGGAGTGGGA GAGCAGCGGG 2900 
CAGCCGGAGA ACAACTACAA CACCACGCCT CCCATGCTGG ACTC CGACGG 2950 
CTCCTTCTTC CTCTACAGCA AGCTCACCGT GGACAAGAGC AGGTGG CAG C 3000 
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AGGGGAACAT CTTCTCATGC TCCGTGATGC ATGAGGCTCT GCACAACCGC 3 050 
TTCACGCAGA AGAGCCTCTC CCTGTCTCCG GGTAAATGAG TGCGACGGCC 3100 

5 

GGGGATCCTC TAGAGTCGAC CTGCAGAAGC TTGGCCGCCA TGGCCCAACT 3150 
10 TGTTTATTGC AGCTTATAAT GGTTACAAAT AAAGCAATAG CATCACAAAT 3200 
TTCACAAATA AAGCATTTTT TTCACTGCAT TCTAGTTGTG GTTTGTCCAA 3250 
ACTCATCAAT GTATCTTATC ATGTCTGGAT CGATCGGGAA TTAATTCGGC 3300 
GCAGCACCAT GGCCTGAAAT AACCTCTGAA AGAGGAACTT GGTTAGGTAC 3350 
CTTCTGAGGC GGAAAGAACC AGCTGTGGAA TGTGTGTCAG TTAGGGTGTG 34 00 
25 GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3450 
AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG 3500 
AAGTATGCAA AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC 3550 
TAACTCCGCC CATCCCGCCC CTAACTCCGC CCAGTTCCGC CCATTCTCCG 3600 
CCCCATGGCT GACTAATTTT TTTTATTTAT GCAGAGGCCG AGGCCGCCTC 3650 
4 0 GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT GGAGGCCTAG 3700 
GCTTTTGCAA AAAGCTGTTA ACAGCTTGGC ACTGGCCGTC GTTTTACAAC 3750 
GTCGTGACTG GGAAAACCCT GGCGTTACCC AACTTAATCG CCTTGCAGCA 3800 
CATCCCCCCT TCGCCAGCTG GCGTAATAGC GAAGAGGCCC GCACCGATCG 38 50 
CCCTTCCCAA CAG TTGCGTA GCCTGAATGG CGAATGGCGC CTGATGCGGT 3 900 
55 ATTTTCTCCT TACGCATCTG TGCGGTATTT CACACCGCAT ACGTCAAAGC 3 950 
AACCATAGTA CGCGCCCTGT AG CGG CGCAT TAAGCGCGGC GGGTGTGGTG 4000 
GTTACGCGCA GCGTGACCGC TACACTTGCC AGCGCCCTAG CGCCCGCTCC 4050 
TTTCGCTTTC TTCCCTTCCT TTCTCGCCAC GTTCGCCGGC TTTCCCCGTC 4100 
AAGCTCTAAA TCGGGGGCTC CCTTTAGGGT TCCGATTTAG TGCTTTACGG 4150 
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CACCTCGACC CCAAAAAACT TGATTTGGGT GATGGTTCAC GTAGTGGGCC 4200 
ATCGCCCTGA TAGACGGTTT TTCGCCCTTT GACGTTGGAG TCCACGTTCT 4250 

5 

TTAATAGTGG ACTCTTGTTC CAAACTGGAA CAACACTCAA CCCTATCTCG 4300 
10 GGCTATTCTT TTGATTTATA AGGGATTTTG CCGATTTCGG CCTATTGGTT 4350 
AAAAAATGAG CTGATTTAAC AAAAATTTAA CGCGAATTTT AACAAAATAT 4400 
TAACGTTTAC AATTTTATGG TGCACTCTCA GTACAATCTG CTCTGATGCC 4450 
GCATAGTTAA GCCAACTCCG CTATCGCTAC GTGACTGGGT CATGGCTGCG 4 50 0 
CCCCGACACC CGCCAACACC CGCTGACGCG CCCTGACGGG CTTGTCTGCT 4550 
25 CCCGGCATCC GCTTACAGAC AAGCTGTGAC CGTCTCCGGG AGCTG CATGT 4600 
GTCAGAGGTT TTCACCGTCA TCACCGAAAC GCGCGAGGCA GTATTCTTGA 46 SO 
AGACGAAAGG GCCTCGTGAT ACGCCTATTT TTATAGGTTA ATGTCATGAT 4700 
AATAATGGTT TCTTAGACGT CAGGTGGCAC TTTTCGGGGA AATGTGCGCG 4750 
GAACCCCTAT TTGTTTATTT TTCTAAATAC ATTCAAATAT GTATCCGCTC 4800 
4 0 ATGAGACAAT AACCCTGATA AATGCTTCAA TAATATTGAA AAAGGAAGAG 4850 
TATGAGTATT CAACATTTCC GTGTCGCCCT TATTCCCTTT TTTGCGGCAT 4900 
TTTGCCTTCC TGTTTTTGCT CAC C CAG AAA CGCTGGTGAA AGTAAAAGAT 4950 
GCTGAAGATC AGTTGGGTGC ACGAGTGGGT TACATCGAAC TGGATCTCAA 5000 
CAGCGGTAAG ATCCTTGAGA GTTTTCGCCC CGAAGAACGT TTTCCAATGA 5050 
55 TGAGCACTTT TAAAGTTCTG CTATGTGGCG CGGTATTATC CCGTGATGAC 5100 
GCCGGGCAAG AGCAACTCGG TCGCCGCATA CACTATTCTC AGAATGACTT 5150 
GGTTGAGTAC TCACCAGTCA CAGAAAAGCA TCTTACGGAT GGCATGACAG 52 00 
TAAGAGAATT ATGCAGTGCT GCCATAACCA TGAGTGATAA CACTGCGGCC 5250 
AACTTACTTC TGACAACGAT CGGAGGACCG AAGGAGCTAA CCG CTTTTTT 5300 
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GCACAACATG GGGGATCATG TAACTCGCCT TGATCGTTGG GAACCGGAGC 5350 
TGAATGAAGC CATACCAAAC GACGAGCGTG ACACCACGAT GCCAGCAGCA 54 00 
ATGGCAACAA CGTTGCGCAA ACTATTAACT GGCGAACTAC TTACTCTAGC 54 50 
TTCCCGGCAA CAATTAATAG ACTGGATGGA GGCGGATAAA GTTGCAGGAC 5500 
CACTTCTGCG CTCGGCCCTT CCGGCTGGCT GGTTTATTGC TGATAAATCT 5550 
GGAGCCGGTG AGCGTGGGTC TCGCGGTATC ATTGCAGCAC TGGGGCCAGA 56 00 
TGGTAAGCCC TCCCGTATCG TAGTTATCTA CACGACGGGG AGTCAGGCAA 5650 
CTATGGATGA ACGAAATAGA CAGATCGCTG AGATAGGTGC CTCACTGATT 5700 
AAGCATTGGT AACTGTCAGA CCAAGTTTAC TCATATATAC TTTAGATTGA 5750 
TTTAAAACTT CATTTTTAAT TTAAAAGGAT CTAGGTGAAG ATCCTTTTTG 5800 
ATAATCTCAT GACCAAAATC CCTTAACGTG AGTTTTCGTT CCACTGAGCG 5850 
TCAGACCCCG TAGAAAAGAT CAAAGGATCT TCTTGAGATC CTTTTTTTCT 5900 
GCGCGTAATC TGCTGCTTGC AAACAAAAAA ACCACCGCTA CCAGCGGTGG 5950 
TTTG TTTGCC GGATCAAGAG CTACCAACTC TTTTTCCGAA GGTAACTGGC 6000 
TTCAGCAGAG CGCAGATACC AAATACTGTC CTTCTAGTGT AGCCGTAGTT 6050 
AGGCCACCAC TTCAAGAACT CTGTAGCACC GCCTACATAC CTCGCTCTGC 6100 
TAATCCTGTT ACCAGTGGCT GCTGCCAGTG GCGATAAGTC GTGTCTTACC 6150 
GGGTTGGACT CAAGACGATA GTTACCGGAT AAGGCGCAGC GGTCGGGCTG 6200 
AACGGGGGGT TCGTGCACAC AGCCCAGCTT GGAGCGAACG ACCTACACCG 62 50 
AACTGAGATA CCTACAGCGT GAGCATTGAG AAAGCGCCAC GCTTCCCGAA 6300 
GGGAGAAAGG CGGACAGGTA TCCGGTAAGC GGCAGGGTCG GAACAGGAGA 63 50 
GCGCACGAGG GAGCTT CCAG GGGGAAACGC CTGGTATCTT TATAGTC CTG 64 00 
TCGGGTTTCG CCACCTCTGA CTTGAGCGTC GATTTTTGTG ATGCTCGTCA 6450 
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GGGGGGCGGA GCCTATGGAA AAACGCCAGC AACGCGGCCT TTTTACGGTT 6 500 
CCTGGCCTTT TGCTGGCCTT TTGCTCACAT GTTCTTTCCT GCGTTATCCC 6550 

5 

CTGATTCTGT GGATAACCGT ATTACCGCCT TTGAGTGAGC TGATACCGCT 6600 
10 CGCCGCAGCC GAACGACCGA GCGCAGCGAG TCAGTGAGCG AGGAAGCGGA 6650 
AGAGCGCCCA ATACGCAAAC CGCCTCTCCC CGCGCGTTGG CCGATTCATT 6700 
AATCCAGCTG GCACGACAGG TTTCCCGACT GGAAAGCGGG CAGTGAGCGC 6750 
AACGCAATTA ATGTGAGTTA CCTCACTCAT TAGGCACCCC AGGCTTTACA 6800 
CTTTATGCTT CCGGCTCGTA TGTTGTGTGG AATTGTGAGC GGATAACAAT 6850 
2 5 TTCACACAGG AAACAGCTAT GACCATGATT ACGAATTAA 688 9 
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(2) INFORMATION FOR SEQ ID NO: 3 



( i ) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 6557 bases 
<B) TYPE: nucleic acid 
(C) STRANDEDNESS : double 
3 5 (D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 
40 TTCGAGCTCG CCCGACATTG ATTATTGACT AGAGTCGATC GACAGCTGTG 50 
GAATGTGTGT CAGTTAGGGT GTGGAAAGTC CCCAGGCTCC CCAGCAGGCA 100 
GAAGTATGCA AAGCATGCAT CTCAATTAGT CAGCAACCAG GTGTGGAAAG 150 
TCCCCAGGCT CCCCAGCAGG CAGAAGTATG CAAAGCATGC ATCTCAATTA 200 
GTCAGCAACC ATAGTCCCGC CCCTAACTCC GCCCATCCCG CCCCTAACTC 250 
55 CGCCCAGTTC CGCCCATTCT CCGCCCCATG GCTGACTAAT TTTTTTTATT 300 
TATGCAGAGG CCGAGGCCGC CTCGGCCTCT GAGCTATTCC AGAAGTAGTG 350 
AGGAGGCTTT TTTGGAGGCC TAGG CTTTTG CAAAAAGCTA GCTTATCCGG 400 
CCGGGAACGG TGCATTGGAA CGCGGATTCC CCGTGCCAAG AGTGACG TAA 450 
GTACCGCCTA TAGAGCGATA AGAGGATTTT ATCCCCGCTG CCATCATGGT 500 
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TCGACCATTG AACTGCATCG TCGCCGTGTC CCAAAATATG GGGATTGGCA 550 
AGAACGGAGA CCTACCCTGG CCTCCGCTCA GGAACGAGTT CAAGTACTTC 600 

5 

CAAAGAATGA CCACAACCTC TTCAGTGGAA GGTAAACAGA ATCTGGTGAT 650 
10 TATGGGTAGG AAAACCTGGT TCTCCATTCC TGAGAAGAAT CGAC CTTTAA 700 
AGGACAGAAT TAATATAGTT CTCAGTAGAG AACTCAAAGA ACCACCACGA 750 
GGAGCTCATT TTCTTGCCAA AAGTTTGGAT GATGCCTTAA GACTTATTGA BOO 
ACAACCGGAA TTGGCAAGTA AAGTAGACAT GGTTTGGATA GTCGGAGGCA 850 
GTTCTGTTTA CCAGGAAGCC ATGAATCAAC CAGGCCACCT TAGACTCTTT 900 
25 GTGACAAGGA TCATGCAGGA ATTTGAAAGT GACACGTTTT TCCCAGAAAT 950 
TGATTTGGGG AAATATAAAC CTCTCCCAGA ATACCCAGGC GTC CTCTCTG 1000 
AGGTCCAGGA GGAAAAAGGC ATCAAGTATA AGTTTGAAGT CTACGAGAAG 1050 
AAAGACTAAC AGGAAGATGC TTTCAAGTTC TCTGCTCCCC TCCTAAAGCT 1100 
ATGCATTTTT ATAAGACCAT GGGACTTTTG CTGGCTTTAG ATCCCCTTGG 1150 
40 CTTCGTTAGA ACGCAGCTAC AATTAATACA TAACCTTATG TATCATACAC 1200 
ATACGATTTA GGTGACACTA TAGATAACAT CCACTTTGCC TTTCTCTCCA 1250 
CAGGTGTCCA CTCCCAGGTC CAACTGCACC TCGGTTCTAT CGATTGAATT 13 00 
CCACCATGGG ATGGTCATGT ATCATCCTTT TTCTAGTAGC AACTGCAACT 1350 
GGAGTACATT CAGAAGTTCA GCTGGTGGAG TCTGGCGGTG GCCTGGTGCA 1400 
55 GCCAGGGGGC TCACTCCGTT TGTCCTGTGC AGTTTCTGGC TACTCCATCA 145 0 
CCTCCGGATA TAG CTGGAAC TGGATCCGTC AGGCCCCGGG TAAGGGCCTG 1500 
GAATGGGTTG CATCGATTAC GTATGCCGGA TCGACTAACT ATAACCCTAG 1550 
CGTCAAGGGC CGTATCACTA TAAGTCGCGA CG ATTCCAAA AACACATT CT 1600 
AC CTGCAGAT GAACAGCCTG CGTGCTGAGG ACACTGCCGT CTATTATTGT 1650 
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GCTCGAGGCA GCCACTATTT CGGCGCCTGG CACTTCGCCG TGTGGGGTCA 1700 
AGGAACCCTG GTCACCGTCT CCTCGGCCTC CACCAAGGGC CCATCGGTCT 1750 

5 

TCCCCCTGGC ACCCTCCTCC AAGAGCACCT CTGGGGGCAC AGCGGCCCTG 1800 
10 GGCTGCCTGG TCAAGGACTA CTTCCCCGAA CCGG TGACGG TGTCGTGGAA 1850 
CTCAGGCGCC CTGACCAGCG GCGTGCACAC CTTCCCGGCT GTCCTACAGT 1900 
CCTCAGGACT CTACTCCCTC AGCAGCGTGG TGACTGTGCC CTCTAGCAGC 1950 
TTGGGCACCC AGACCTACAT CTG CAACGTG AATCACAAGC CCAGCAACAC 2000 
CAAGGTGGAC AAGAAAGTTG AGCCCAAATC TTGTGACAAA ACTCACACAT 2050 
25 GCCCACCGTG CCCAGCACCT GAACTCCTGG GGGGACCGTC AGTCTTCCTC 2100 
TTCCCCCCAA AACCCAAGGA CACCCTCATG ATCTCCCGGA CCCCTGAGGT 2150 
CACATGCGTG GTGGTGGACG TGAGCCACGA AGACCCTGAG GTCAAGTTCA 2200 
ACTGGTACGT GGACGGCGTG GAGGTGCATA ATGCCAAGAC AAAGCCGCGG 2250 
GAGGAGCAGT ACAACAGCAC GTACCGTGTG GTCAGCGTCC TCACCGTCCT 2300 
4 0 GCACCAGGAC TGGCTGAATG GCAAGGAGTA CAAGTGCAAG GTCTCCAACA 2350 
AAGCCCTCCC AGCCCCCATC GAGAAAACCA TCTCCAAAGC CAAAGGGCAG 2400 
CCCCGAGAAC CACAGGTGTA CACCCTGCCC CCATCCCGGG AAGAGATGAC 24 50 
CAAGAACCAG GTCAGCCTGA CCTGCCTGGT CAAAGGCTTC TATCCCAGCG 2500 
ACATCGCCGT GGAGTGGGAG AGCAATGGGC AGCCGGAGAA CAACTACAAG 2550 
55 ACCACGCCTC CCGTGCTGGA CTCCGACGGC TCCTTCTTCC TCTACAGCAA 2600 
GCTCACCGTG GACAAGAGCA GGTGG CAGCA GGGGAACGTC TTCTCATGCT 2650 
CCGTGATGCA TGAGGCTCTG CACAACCACT ACACGCAGAA GAGCCTCTCC 2700 
CTGTCTCCGG GTAAATGAGT GCGACGGCCC TAGAGTCGAC CTGCAGAAGC 2750 
TTGGCCGCCA TGGCCCAACT TGTTTATTGC AGCTTATAAT GGTTACAAAT 2800 
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AAAGCAATAG CATCACAAAT TTCACAAATA AAGCATTTTT TTCACTGCAT 2850 



TCTAGTTGTG GTTTGTCCAA ACTCATCAAT GTATCTTATC ATGTCTGGAT 2900 

5 

CGATCGGGAA TTAATTCGGC GCAGCACCAT GGCCTGAAAT AACCTCTGAA 2950 
10 AGAGGAACTT GGTTAGGTAC CTTCTGAGGC GGAAAGAACC AGCTGTGGAA 3000 
TGTGTGTCAG TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA 3050 

15 

GTATGCAAAG CATGCATCTC AATTAGTCAG CAACCAGGTG TGGAAAGTCC 3100 
CCAGGCTCCC CAGCAGGCAG AAGTATGCAA AGCATGCATC TCAATTAGTC 3150 

20 

AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC CTAACTCCGC 3200 
2 5 CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3250 
GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG 3300 

30 

AGGCTTTTTT GGAGG CCTAG GCTTTTGCAA AAAGCTGTTA CCTCGAGCGG 3350 
CCG CTTAATT AAGGCGCGCC ATTTAAATCC TGCAGGTAAC AGCTTGGCAC 34 00 

35 

TGGCCGTCGT TTTACAACGT CGTGACTGGG AAAACCCTGG CGTTACCCAA 3450 



4 0 CTTAATCGCC TTGCAGCACA TCCCCCCTTC GCCAGCTGGC GTAATAG CG A 3500 
AGAGGCCCGC ACCGATCGCC CTTCCCAACA GTTGCGTAGC CTGAATGGCG 3550 

45 

AATGG CGCCT GATGCGGTAT TTT CTCCTTA CGCATCTGTG CGGTATTTCA 3600 
CACCGCATAC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA 3650 

50 

AGCGCGGCGG GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG 3700 
55 CGCCCTAGCG CCCGCTCCTT TCGCTTTCTT CCCTTCCTTT CTCGCCACGT 3750 
TCGCCGGCTT TCCCCGTCAA GCTCTAAATC GGGGGCTCCC TTTAGGGTTC 3800 

60 

CGATTTAGTG CTTTACGGCA CCTCGACCCC AAAAAACTTG ATTTGGGTGA 3 850 
TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 3 900 

65 

CGTTGGAGTC CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA 3 950 
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ACACTCAACC CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC 4 000 
GATTTCGGCC TATTGGTTAA AAAATGAGCT GATTTAACAA AAATTTAACG 4 050 

5 

CGAATTTTAA CAAAATATTA ACGTTTACAA TTTTATGGTG CACTCTCAGT 4100 
10 ACAATCTGCT CTGATGCCGC ATAGTTAAGC CAACTCCGCT ATCGCTACGT 4150 
GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG CTGACGCGCC 4200 
CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 4250 
TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC AC CGAAACGC 4300 
GCGAGGCAGT ATTCTTGAAG ACGAAAGGGC CTCGTGATAC GCCTATTTTT 4350 
25 ATAGGTTAAT GTCATGATAA TAATGGTTTC TTAGACGTCA GGTGGCACTT 4400 
TTCGGGGAAA TGTGCGCGGA ACCCCTATTT GTTTATTTTT CTAAATACAT 4450 
TCAAATATGT ATCCG CTCAT GAGACAATAA CCCTGATAAA TGCTTCAATA 4500 
ATATTGAAAA AGGAAGAGTA TGAGTATTCA ACATTTC CGT GTCGCCCTTA 4550 
TTCCCTTTTT TGCGGCATTT TGCCTTCCTG TTTTTGCTCA CCCAGAAACG 4600 
40 CTGGTGAAAG TAAAAGATGC TGAAGATCAG TTGGGTGCAC GAGTGGGTTA 4650 
CATCGAACTG GATCTCAACA GCGGTAAGAT CCTTGAGAGT TTTCGCCCCG 4700 
AAGAACG TTT TCCAATGATG AGCACTTTTA AAGTTCTGCT ATGTGGCGCG 4750 
GTATTATCCC GTGATGACGC CGGGCAAGAG CAACTCGGTC GCCGCATACA 4800 
CTATTCTCAG AATGACTTGG TTGAGTACTC ACCAGTCACA GAAAAGCATC 4850 
55 TTACGGATGG CATGACAGTA AGAGAATTAT GCAGTGCTGC CATAACCATG 4 900 
AGTGATAACA CTGCGGCCAA CTTACTTCTG ACAACGATCG GAGGACCGAA 4950 
GGAGCTAACC GCTTTTTTGC ACAACATGGG GGATCATGTA ACTCGCCTTG 5000 
ATCGTTGGGA ACCGGAGCTG AATGAAGCCA TACCAAACGA CGAGCGTGAC 5050 
ACCACGATGC CAGCAGCAAT GGCAACAACG TTGCGCAAAC TATTAACTGG 5100 
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CGAACTACTT ACTCTAGCTT CCCGGCAACA ATTAATAGAC TGGATGGAGG 5150 



CGGATAAAGT TGCAGGACCA CTTCTGCGCT CGGCCCTTCC GGCTGGCTGG 5200 

5 

TTTATTGCTG ATAAATCTGG AGCCGGTGAG CGTGGGTCTC GCGGTATCAT 5250 



10 TGCAGCACTG GGGCCAGATG GTAAGCCCTC C CGTATCGTA GTTATCTACA 5300 
CGACGGGGAG TCAGGCAACT ATGGATGAAC GAAATAGACA GATCGCTGAG 53 SO 

15 

ATAGGTGCCT CACTGATTAA GCATTGGTAA CTGTCAGACC AAGTTTACTC 54 00 
ATATATACTT TAGATTGATT TAAAACTTCA TTTTTAATTT AAAAGGATCT 5450 

20 



AGGTGAAGAT CCTTTTTGAT AATCTCATGA CCAAAATCCC TTAACGTGAG 5500 



25 TTTTCGTTCC ACTGAGCGTC AGAC CCCGTA GAAAAGATCA AAGGATCTTC 5550 
TTGAGATCCT TTTTTTCTGC GCGTAATCTG CTGCTTGCAA ACAAAAAAAC 56 00 

30 

CACCGCTACC AGCGGTGGTT TGTTTGCCGG ATCAAGAGCT ACCAACTCTT 5650 
TTTCCGAAGG TAACTGGCTT CAG CAGAGCG CAGATACCAA ATACTGTCCT 5700 

35 

TCTAGTGTAG CCGTAGTTAG GCCACCACTT CAAGAACTCT GTAGCACCGC 5750 



40 CTACATACCT CGCTCTGCTA ATCCTGTTAC CAGTGGCTGC TGCCAGTGGC 5800 
GATAAGTCGT GTCTTACCGG GTTGGACTCA AGACGATAGT TACCGGATAA 5850 

45 

GGCGCAGCGG TCGGGCTGAA CGGGGGGTTC GTGCACACAG CCCAGCTTGG 5900 
AG CG AACGAC CTACACCGAA CTGAGATACC TACAGCGTGA GCATTGAGAA 5950 

50 



AGCGCCACGC TTCCCGAAGG GAG AAAGG CG GACAGGTATC CGGTAAGCGG 6000 



5 5 CAGGGTCGGA ACAGGAGAGC GCACGAGGGA GCTTCCAGGG GGAAACGCCT 6050 
GGTATCTTTA TAGTCCTGTC GGGTTTCGCC ACCTCTGACT TGAGCGTCGA 6100 

60 

TTTTTGTGAT GCTCGTCAGG GGGGCGGAGC CTATGGAAAA ACGCCAGCAA 6150 
CGCGGCCTTT TTACGGTTCC TGGCCTTTTG CTGGCCTTTT GCTCACATGT 6200 

65 

TCTTTCCTGC GTTATCCCCT GATTCTGTGG ATAACCGTAT TACCGCCTTT 62 50 



GAGTGAGCTG ATACCGCTCG CCGCAGCCGA ACGACCGAGC GCAGCGAGTC 6300 
AGTGAGCGAG GAAGCGGAAG AGCGCCCAAT ACGCAAACCG CCTCTCCCCG 6350 
CGCGTTGGCC GATTCATTAA TCCAGCTGGC ACGACAGGTT TCCCGACTGG 6400 
AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTACC TCACTCATTA 64 SO 
GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 6500 
TTGTGAG CGG ATAACAATTT CACACAGGAA ACAGCTATGA CCATGATTAC 6550 
GAATTAA 6557 

(2) INFORMATION FOR SEQ ID NO; 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7305 baaes 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
TTCGAGCTCG CCCGACATTG ATTATTGACT AGTTATTAAT AGTAATCAAT 50 
TACGGGGTCA TTAGTTCATA GCCCATATAT GGAGTTCCGC GTTACATAAC 100 
TTACGGTAAA TGGCCCGCCT GGCTGACCGC CCAACGACCC CCGCCCATTG 150 
ACGTCAATAA TGACGTATGT TCCCATAGTA ACGCCAATAG GGACTTTCCA 200 
TTGACGTCAA TGGGTGGAGT ATTTACGGTA AACTGCCCAC TTGGCAGTAC 2 50 
ATCAAGTGTA TCATATGCCA AGTACGCCCC CTATTGACGT CAATGACGGT 3 00 
AAATGGCCCG CCTGGCATTA TGCCCAGTAC ATGACCTTAT GGGACTTTCC 3 50 
TACTTGGCAG TACATCTACG TATTAGTCAT CGCTATTACC ATGGTGATGC 400 
GGTTTTGGCA GTACATCAAT GGG CGTGGAT AGCGGTTTGA CTCACGGGGA 450 
TTTCCAAGTC TCCACCCCAT TGACGTCAAT GGGAGTTTGT TTTGGCACCA 500 
AAATCAACGG GACTTTCCAA AATGTCGTAA CAACTCCGCC CCATTGACGC 550 
AAATGGGCGG TAGGCGTGTA CGGTGGGAGG TCTATATAAG CAGAGCTCGT 6 00 
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TTAGTGAACC GTCAGATCGC CTGGAGACGC CATCCACGCT GTTTTGACCT 6 50 
CCATAGAAGA CACCGGGACC GATCCAGCCT CCGCGGCCGG GAACGGTGCA 700 

5 

TTGGAACGCG GATTCCCCGT GCCAAGAGTG ACGTAAGTAC CGCCTATAGA 750 
10 GTCTATAGGC CCACCCCCTT GGCTTCGTTA GAACGCGGCT ACAATTAATA 800 
CATAACCTTA TGTATCATAC ACATACGATT TAGGTGACAC TATAGAATAA 850 
CATCCACTTT GCCTTTCTCT CCACAGGTGT CCACTCCCAG GTCCAACTGC 900 
ACCTCGGTTC TAAGCTTATC GATATGAAAA AGCCTGAACT CACCGCGACG 950 
TCTGTCGAGA AGTTTCTGAT CGAAAAGTTC GACAGCGTCT CCGACCTGAT 1000 
25 GCAGCTCTCG GAGGGCGAAG AATCTCGTGC TTTCAGCTTC GATGTAGGAG 1050 
GGCGTGGATA TGTCCTGCGG GTAAATAGCT GCGCCGATGG TTTCTACAAA 1100 
GATCGTTATG TTTATCGGCA CTTTGCATCG GCCGCGCTCC CGATTCCGGA 1150 
AGTGCTTGAC ATTGGGGAAT TCAGCGAGAG CCTGACCTAT TGCATCTCCC 1200 
GCCGTGCACA GGGTGTCACG TTGCAACACC TGCCTGAAAC CGAACTGCCC 1250 
4 0 GCTGTTCTGC AGCCGGTCGC GGAGGCCATG GATGCGATCG CTGCGGCCGA 13 00 
TCTTAGC CAG ACGAGCGGGT TCGGCCCATT CGGACCGCAA GGAATCGGTC 1350 
AATACACTAC ATGGCGTGAT TTCATATGCG CGATTGCTGA TCCCCATGTG 14 00 
TATCACTGGC AAACTGTGAT GGACGACACC GTCAGTGCGT CCGTCGCGCA 1450 
GGCTCTCGAT GAGCTGATGC TTTGGGCCGA GGACTGCCCC GAAGTCCGGC 1500 
55 ACCTCGTGCA CGCGGATTTC GGCTCCAACA ATGTCCTGAC GGACAATGGC 1550 
CGCATAACAG CGGTCATTGA CTGGAGCGAG GCGATGTTCG GGGATTCCCA 1600 
ATACGAGGTC GCCAACATCT TCTTCTGGAG GCCGTGGTTG GCTTGTATGG 1650 
AG CAGCAG AC GTACTTCGAG CGGAGGCATC CGGAGCTTGC AGGATCGCCG 1700 
CGGCTCCGGG CGTATATGCT CCGCATTGGT CTTGACCAAC TCTATCAGAG 1750 
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CTTGGTTGAC GGCAATTTCG ATGATG CAG C TTGGGCGCAG GGTCGATGCG 1800 
ACGCAATCGT CCGATCCGGA GCCGGGACTG TCGGGCGTAC ACAAATCGCC 1850 
CGCAGAAGCG CGGCCGTCTG GACCGATGGC TGTGTAGAAG TACTCGCCGA 1900 
TAGTGGAAAC CGACGCCCCA GCACTCGTCC GAGGGCAAAG GAATAGAGTA 1950 
GATGCCGACC GAAGGATCCC CGGGGAATTC AATCGATGGC CGCCATGGCC 2000 
CAACTTGTTT ATTGCAGCTT ATAATGGTTA CAAATAAAGC AATAGCATCA 2050 
CAAATTTCAC AAATAAAGCA TTTTTTTCAC TGCATTCTAG TTGTGGTTTG 2100 
TCCAAACTCA TCAATGTATC TTATCATGTC TGGATCGATC GGGAATTAAT 2150 
TCGGCGCAGC ACCATGGCCT GAAATAACCT CTGAAAGAGG AACTTGGTTA 2200 
GGTACCTTCT GAGGCGGAAA GAACCAGCTG TGGAATGTGT GTCAGTTAGG 2250 
GTGTGGAAAG TCCCCAGGCT CCCCAGCAGG CAGAAGTATG CAAAGCATGC 2300 
ATCTCAATTA G T CAG CAAC C AGGTGTGGAA AGTCCCCAGG CTCCCCAGCA 23 50 
GGCAGAAGTA TGCAAAGCAT GCATCTCAAT TAGTCAGCAA CCATAGTCCC 2400 
GCCCCTAACT CCGCCCATCC CGCCCCTAAC TCCGCCCAGT TCCGCCCATT 2450 
CTCCGCCCCA TGGCTGACTA ATTTTTTTTA TTTATGCAGA GGCCGAGGCC 2500 
GCCTCGGCCT CTGAGCTATT CCAGAAGTAG TGAGGAGGCT TTTTTGGAGG 2550 
C CTAGG CTTT TGCAAAAAGC TAGCTTATCC GGCCGGGAAC GGTGCATTGG 2600 
AACG CGGATT CCCCGTGCCA AGAGTCAGGT AAGTACCGCC TATAGAGTCT 2650 
ATAGGCCCAC CCCCTTGGCT TCGTTAGAAC GCGGCTACAA TTAATACATA 2 700 
ACCTTTTGGA TCGATCCTAC TGACACTGAC ATCCACTTTT TCTTTTTCTC 2 750 
CACAGGTGTC CACTCCCAGG TCCAACTGCA CCTCGGTTCG CGAAG CTAGC 2800 
TTGGGCTGCA TCGATTGAAT TCCACCATGG GATGGTCATG TATCATC CTT 2850 
TTTCTAGTAG CAACTGCAAC TGGAGTACAT TCAGATATCC AGCTGACCCA 2900 
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GTCCCCGAGC TCCCTGTCCG CCTCTGTGGG CGATAGGGTC ACCATCACCT 2 950 
GCCGTGCCAG TCAGAGCGTC GATTACGATG GTGATAGCTA CATGAACTGG 3000 

5 

TATCAACAGA AAC CAGG AAA AGCTCCGAAA CTACTGATTT ACGCGGCCTC 3050 
10 GTACCTGGAG TCTGGAGTCC CTTCTCGCTT CTCTGGATCC GGTTCTGGGA 3100 
CGGATTTCAC TCTGACCATC AG C AGTCTG C AGC CGGAAGA CTTCGCAACT 3150 



TATTACTGTC AGCAAAGTCA CGAGGATCCG TACACATTTG GACAGGGTAC 3200 



CAAGGTGGAG ATCAAACGAA CTGTGGCTGC ACCATCTGTC TTCATCTTCC 3250 

20 

CGCCATCTGA TGAGCAGTTG AAATCTGGAA CTGCCTCTGT TGTGTGCCTG 33 00 



25 CTGAATAACT TCTATCCCAG AGAGGCCAAA GTACAGTGGA AGGTGGATAA 3350 



CGCCCTCCAA TCGGGTAACT CCCAGGAGAG TGTCACAGAG CAGGACAGCA 3400 



AGGACAGCAC CTACAGCCTC AGCAGCACCC TGACGCTGAG CAAAGCAGAC 3450 
TACGAGAAAC ACAAAGTCTA CGCCTGCGAA GTCACCCATC AGGGCCTGAG 3500 

35 

CTCGCCCGTC ACAAAGAGCT TCAACAGGGG AGAGTGTTAA GCTTCGATGG 3550 
4 0 CCGCCATGGC CCAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG 3600 
CAATAGCATC ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA 3650 

45 

GTTGTGGTTT GTCCAAACTC ATCAATGTAT CTTATCATGT CTGGATCGAT 3700 
CGGGAATTAA TTCGGCGCAG CACCATGGCC TGAAATAACC TCTGAAAGAG 3 750 

50 

GAACTTGGTT AGGTACCTTC TGAGGCGGAA AGAACCAGCT GTGGAATGTG 3800 
55 TGTCAGTTAG GGTGTGGAAA GTCCCCAGGC TCCCCAGCAG GCAGAAGTAT 3850 
G CAAAGCATG CATCTCAATT AGTCAGCAAC CAGGTGTGGA AAGTCCCCAG 3 900 

60 

GCTCCCCAGC AGGCAGAAGT ATGCAAAGCA TGCATCTCAA TTAGTC AG CA 3 950 
ACCATAGTCC CGCCCCTAAC TCCGCCCATC CCGCCCCTAA CTCCGCCCAG 4 000 

65 

TTCCGCCCAT TCTCCGCCCC ATGGCTGACT AATTTTTTTT ATTTATGCAG 4 050 
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AGGCCGAGGC CGCCTCGGCC TCTGAGCTAT TCCAGAAGTA GTGAGGAGGC 4100 
TTTTTT GGAG GCCTAGGCTT TTGCAAAAAG CTGTTAACAG CTTGGCACTG 4150 

5 

GCCGTCGTTT TACAACGTCG TGACTGGGAA AACCCTGGCG TTACCCAACT 4200 
10 TAATCGCCTT GCAGCACATC CCCCCTTCGC CAGCTGGCGT AATAGCGAAG 42 50 
AGGCCCGCAC CGATCGCCCT TCCCAACAGT TGCGTAG CCT GAATGGCGAA 43 00 
TGGCGCCTGA TGCGGTATTT TCTCCTTACG CATCTGTGCG GTATTTCACA 4350 
CCGCATACGT CAAAGCAACC ATAGTACGCG CCCTGTAGCG GCGCATTAAG 44 00 
CGCGGCGGGT GTGGTGGTTA CGCGCAGCGT GACCGCTACA CTTGCCAGCG 44 50 
2 5 CCCTAGCGCC CGCTCCTTTC GCTTTCTTCC CTTCCTTTCT CGCCACGTTC 4 500 
GCCGGCTTTC CCCGTCAAGC TCTAAATCGG GGGCTCCCTT TAGGGTTCCG 4 550 
ATTTAGTGCT TTACGGCACC TCGACCCCAA AAAACTTGAT TTGGGTGATG 4600 
GTTCACGTAG TGGGCCATCG CCCTGATAGA CGGTTTTTCG CCCTTTGACG 4650 
TTGGAGTCCA CGTTCTTTAA TAGTGGACTC TTGTTCCAAA CTGGAACAAC 4700 
4 0 ACTCAACCCT ATCTCGGGCT ATTCTTTTGA TTTATAAGGG ATTTTGCCGA 4 750 
TTTCGGCCTA TTGGTTAAAA AATGAGCTGA TTTAACAAAA ATTTAACGCG 4800 
AATTTTAACA AAATATTAAC GTTTACAATT TTATGGTGCA CTCTCAGTAC 4 850 
AATCTGCTCT GATGCCGCAT AGTTAAGCCA ACTCCGCTAT CGCTACGTGA 4 900 
CTGGGTCATG GCTGCGCCCC GACACCCGCC AACACCCGCT GACGCGCCCT 4 950 
55 GACGGGCTTG TCTGCTCCCG GCATCCGCTT ACAGACAAGC TGTGACCGTC 5000 
TCCGGGAGCT GCATGTGTCA GAGGTTTTCA CCGTCATCAC CGAAACG CGC 505 0 
GAGGCAGTAT TCTTGAAGAC GAAAGGGCCT CGTGATACGC CTATTTTTAT 5100 
AGGTTAATGT CATGATAATA ATGGTTTCTT AGACGTCAGG TGGCACTTTT 5150 
CGGGGAAATG TGCGCGGAAC CCCTATTTGT TTATTTTTCT AAATACATTC 5200 
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AAATATGTAT CCGCTCATGA GACAATAACC CTGATAAATG CTTCAATAAT 5250 
ATTGAAAAAG GAAGAGTATG AGTATTCAAC ATTTCCGTGT CGCCCTTATT 5300 

5 

CCCTTTTTTG CGGCATTTTG CCTTCCTGTT TTTGCTCACC CAGAAACGCT S3 50 
10 GGTGAAAGTA AAAGATGCTG AAGATCAGTT GGGTGCACGA GTGGGTTACA 5400 
TCGAACTGGA TCTCAACAGC GGTAAGATCC TTGAGAGTTT TCGCCCCGAA 5450 

15 

GAACGTTTTC CAATGATGAG CACTTTTAAA GTTCTGCTAT GTGGCGCGGT 5500 
ATTATCCCGT GATGACGCCG GGCAAGAGCA ACTCGGTCGC CGCATACACT 5550 

20 

ATTCTCAGAA TGACTTGGTT GAGTACTCAC CAGTCACAGA AAAGCATCTT 5600 
25 ACGGATGGCA TGACAGTAAG AGAATTATGC AGTGCTGCCA TAACCATGAG 5650 
TGATAACACT GCGGCCAACT TACTTCTGAC AACGATCGGA GGACCGAAGG 5 700 

30 

AGCTAACCGC TTTTTTGCAC AACATGGGGG ATCATGTAAC TCGCCTTGAT 5750 
CGTTGGGAAC CGGAGCTGAA TGAAGCCATA CCAAACGACG AGCGTGACAC 5800 

35 

CACGATGCCA GCAGCAATGG CAACAACGTT GCGCAAACTA TTAACTGGCG 5850 
40 AACTACTTAC TCTAGCTTCC CGGCAACAAT TAATAGACTG GATGGAGGCG 5900 
GATAAAGTTG CAGGACCACT TCTGCGCTCG GCCCTTCCGG CTGGCTGGTT 5950 

45 

TATTGCTGAT AAATCTGGAG CCGGTGAGCG TGGGTCTCGC GGTATCATTG 6000 
CAGCACTGGG GCCAGATGGT AAGCCCTCCC GTATCGTAGT TATCTACACG 6050 

50 

ACGGGGAGTC AGGCAACTAT GGATGAACGA AATAGACAGA TCGCTGAGAT 6100 
55 AGGTGCCTCA CTGATTAAGC ATTGGTAACT GTCAGACCAA GTTTACTCAT 6150 
ATATACTTTA GATTGATTTA AAACTTCATT TTTAATTTAA AAGGATCTAG 6200 

60 

GTGAAGATCC TTTTTGATAA TCTCATGACC AAAATCCCTT AACGTGAGTT 6250 
TTCGTTCCAC TGAGCGTCAG ACCCCGTAGA AAAGATCAAA GGATCTTCTT 6300 

65 

GAGATCCTTT TTTTCTGCGC GTAATCTGCT GCTTGCAAAC AAAAAAACCA 6350 
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CCGCTACCAG CGGTGGTTTG TTTG C CGG AT CAAGAGCTAC CAACTCTTTT 64 00 



TCCGAAGGTA ACTGGCTTCA GCAGAGCGCA GAT AC CAAAT ACTGTCCTTC 6450 

5 

TAGTGTAGCC GTAGTTAGGC CACCACTTCA AGAACTCTGT AGCACCGCCT €500 
10 ACATACCTCG CTCTGCTAAT CCTGTTACCA GTGGCTGCTG CCAGTGGCGA 6550 
TAAGTCGTGT CTTACCGGGT TGGACTCAAG ACGATAGTTA CCGGATAAGG 66 00 

15 

CGCAGCGGTC GGGCTGAACG GGGGGTTCGT GCACACAGCC CAGCTTGGAG 6650 
CGAACGACCT ACACCGAACT GAGATACCTA CAGCGTGAGC ATTGAGAAAG 6700 

20 

CGCCACGCTT CCCGAAGGGA GAAAGGCGGA CAGGTATCCG GTAAGCGGCA 67 50 
2 5 GGG TCGGAAC AGGAGAGCGC ACGAGGGAGC TTCCAGGGGG AAACGCCTGG 6800 
TATCTTTATA GTCCTGTCGG GTTTCGCCAC CTCTGACTTG AGCGTCGATT 6850 

30 

TTTGTGATGC TCGTCAGGGG GGCGGAGCCT ATGGAAAAAC GCCAGCAACG 6900 
CGGCCTTTTT ACGGTTCCTG GCCTTTTGCT GGCCTTTTGC TCACATGTTC 6 950 

35 

TTTCCTGCGT TATCCCCTGA TTCTGTGGAT AACCGTATTA CCG CCTTTGA 7000 
4 0 GTGAGCTGAT ACCGCTCGCC GCAGCCGAAC GACCGAGCGC AGCGAGTCAG 7 050 
TGAG CGAGG A AGCGGAAGAG CGCCCAATAC GCAAACCGCC TCTCCCCGCG 7100 

45 

CGTTGG CCGA TTCATTAATC CAGCTGGCAC GACAGGTTTC CCGACTGGAA 7150 
AGCGGGCAGT GAGCGCAACG CAATTAATGT GAGTTACCTC ACTCATTAGG 7200 

50 

CACCCCAGGC TTTACACTTT ATGCTTCCGG CTCGTATGTT GTGTGGAATT 7250 



55 GTGAG CGGAT AACAATTTCA CACAGGAAAC AGCTATGACC ATGATTACGA 7300 



ATTAA 7305 

60 
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CLAIMS 

I. A DNA construct comprising a transcriptional initiation site, a 
transcriptional termination site, a selectable gene, a product gene 

5 provided 3' to the selectable gene, a transcriptional regulatory 

region regulating transcription of both the selectable gene and the 
product gene, the selectable gene being positioned within an intron 
having a splice donor site 5' of the intron, which splice donor site 
regulates expression of the product gene using the transcriptional 
10 regulatory region. 

2 . The DNA construct of claim 1 wherein the splice donor site comprises 
an efficient splice donor sequence. 

15 3. The DNA construct of claim 2 wherein the splice donor site comprises 
a consensus splice donor sequence. 

4. The DNA construct of claim 2 wherein the splice donor site comprises 
the sequence GACGTAAGT. 

20 

5. The DNA construct of claim 1 wherein the selectable gene is an 
amplifiable gene. 

6. The DNA construct of claim 5 wherein the amplifiable gene is DHFR . 

25 

7. The DNA construct of claim 1 wherein the transcriptional regulatory 
region comprises a promoter and an enhancer. 

8. A vector comprising the DNA construct of claim 1. 

30 

9. The vector of claim 8 wherein the selectable gene of the DNA 
construct is an amplifiable gene. 

10. The vector of claim 8 that is capable of replication in a eukaryotic 
3 5 host. 

II. A eukaryotic host cell comprising the vector of claim 10. 

12. A eukaryotic host cell comprising the DNA construct of claim 5. 

40 

13. The host cell of claim 11 wherein the vector is introduced into the 
host cell by electroporation . 

14. A eukaryotic host cell comprising the DNA construct of claim l 
45 integrated into a chromosome of the host cell. 
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15. The host cell of claim 14 that is a mammalian cell. 

16. A method for producing a product of interest comprising culturing the 
host cell of claim ll so as to express the product gene and 

5 recovering the product from the host cell culture. 

17. The method of claim 16 further comprising recovering the product from 
the culture medium. 

10 18. The method of claim 16 wherein the selectable gene is an amplifiable 
gene and the splice donor site comprises an efficient splice donor 
sequence . 

19. A method for producing a product of interest comprising culturing the 
15 host cell of claim 12 so as to express the product gene in a 

selective medium comprising an amplifying agent for sufficient time 
to allow amplification to occur, and recovering the product. 

20. A method for producing eukaryotic cells having multiple copies of a 
20 product gene comprising transforming eukaryotic cells with the DNA 

construct of claim 5, growing the cells in a selective medium 
comprising an amplifying agent for a sufficient time for 
amplification to occur, and selecting cells having multiple copies 
of the product gene. 

25 

21. The method of claim 20 further comprising recovering from the 
selected cells the product of interest . 
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